Re: Regarding Indexing to elasticsearch

Sebastian Nagel Thu, 01 Mar 2018 08:42:07 -0800

It's impossible to find the reason from console output.
Please check the hadoop.log, it should contain more logs
including those from ElasticIndexWriter.


Sebastian

On 03/01/2018 06:38 AM, Yash Thenuan Thenuan wrote:
> Hi Sebastian All of this is coming but the problem is,The content is not
> sent sent.Nothing is indexed to es.
> This is the output on debug level.
> 
> ElasticIndexWriter
> 
> elastic.cluster : elastic prefix cluster
> 
> elastic.host : hostname
> 
> elastic.port : port  (default 9200)
> 
> elastic.index : elastic index command
> 
> elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
> 
> elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
> 
> 
> no modules loaded
> 
> loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
> 
> loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
> 
> loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
> 
> loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
> 
> loaded plugin [org.elasticsearch.transport.Netty4Plugin]
> 
> created thread pool: name [force_merge], size [1], queue size [unbounded]
> 
> created thread pool: name [fetch_shard_started], core [1], max [8], keep
> alive [5m]
> 
> created thread pool: name [listener], size [2], queue size [unbounded]
> 
> created thread pool: name [index], size [4], queue size [200]
> 
> created thread pool: name [refresh], core [1], max [2], keep alive [5m]
> 
> created thread pool: name [generic], core [4], max [128], keep alive [30s]
> 
> created thread pool: name [warmer], core [1], max [2], keep alive [5m]
> 
> thread pool [search] will adjust queue by [50] when determining automatic
> queue size
> 
> created thread pool: name [search], size [7], queue size [1k]
> 
> created thread pool: name [flush], core [1], max [2], keep alive [5m]
> 
> created thread pool: name [fetch_shard_store], core [1], max [8], keep
> alive [5m]
> 
> created thread pool: name [management], core [1], max [5], keep alive [5m]
> 
> created thread pool: name [get], size [4], queue size [1k]
> 
> created thread pool: name [bulk], size [4], queue size [200]
> 
> created thread pool: name [snapshot], core [1], max [2], keep alive [5m]
> 
> node_sampler_interval[5s]
> 
> adding address [{#transport#-1}{nNtPR9OJShWSW-ayXRDILA}{localhost}{
> 127.0.0.1:9300}]
> 
> connected to node
> [{tzfqJn0}{tzfqJn0sS5OPV4lKreU60w}{QCGd9doAQaGw4Q_lOqniLQ}{127.0.0.1}{
> 127.0.0.1:9300}]
> 
> IndexingJob: done
> 
> 
> On Wed, Feb 28, 2018 at 10:05 PM, Sebastian Nagel <
> [email protected]> wrote:
> 
>> I never tried ES with Nutch 2.3 but it should be similar to setup as for
>> 1.x:
>>
>> - enable the plugin "indexer-elastic" in plugin.includes
>>   (upgrade and rename to "indexer-elastic2" in 2.4)
>>
>> - expects ES 1.4.1
>>
>> - available/required options are found in the log file (hadoop.log):
>>    ElasticIndexWriter
>>         elastic.cluster : elastic prefix cluster
>>         elastic.host : hostname
>>         elastic.port : port  (default 9300)
>>         elastic.index : elastic index command
>>         elastic.max.bulk.docs : elastic bulk index doc counts. (default
>> 250)
>>         elastic.max.bulk.size : elastic bulk index length. (default
>> 2500500 ~2.5MB)
>>
>> Sebastian
>>
>> On 02/28/2018 01:26 PM, Yash Thenuan Thenuan wrote:
>>> Yeah
>>> I was also thinking that
>>> Can somebody help me with nutch 2.3?
>>>
>>> On 28 Feb 2018 17:53, "Yossi Tamari" <[email protected]> wrote:
>>>
>>>> Sorry, I just realized that you're using Nutch 2.x and I'm answering for
>>>> Nutch 1.x. I'm afraid I can't help you.
>>>>
>>>>> -----Original Message-----
>>>>> From: Yash Thenuan Thenuan [mailto:[email protected]]
>>>>> Sent: 28 February 2018 14:20
>>>>> To: [email protected]
>>>>> Subject: RE: Regarding Indexing to elasticsearch
>>>>>
>>>>> IndexingJob (<batchId> | -all |-reindex) [-crawlId <id>] This is the
>>>> output of
>>>>> nutch index i have already configured the nutch-site.xml.
>>>>>
>>>>> On 28 Feb 2018 17:41, "Yossi Tamari" <[email protected]> wrote:
>>>>>
>>>>>> I suggest you run "nutch index", take a look at the returned help
>>>>>> message, and continue from there.
>>>>>> Broadly, first of all you need to configure your elasticsearch
>>>>>> environment in nutch-site.xml, and then you need to run nutch index
>>>>>> with the location of your CrawlDB and either the segment you want to
>>>>>> index or the directory that contains all the segments you want to
>>>> index.
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Yash Thenuan Thenuan [mailto:[email protected]]
>>>>>>> Sent: 28 February 2018 14:06
>>>>>>> To: [email protected]
>>>>>>> Subject: RE: Regarding Indexing to elasticsearch
>>>>>>>
>>>>>>> All I want  is to index my parsed data to elasticsearch.
>>>>>>>
>>>>>>>
>>>>>>> On 28 Feb 2018 17:34, "Yossi Tamari" <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi Yash,
>>>>>>>
>>>>>>> The nutch index command does not have a -all flag, so I'm not sure
>>>>>>> what
>>>>>> you're
>>>>>>> trying to achieve here.
>>>>>>>
>>>>>>>         Yossi.
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Yash Thenuan Thenuan [mailto:[email protected]]
>>>>>>>> Sent: 28 February 2018 13:55
>>>>>>>> To: [email protected]
>>>>>>>> Subject: Regarding Indexing to elasticsearch
>>>>>>>>
>>>>>>>> Can somebody please tell me what happens when we hit the bin/nutc
>>>>>>>> index
>>>>>>> -all
>>>>>>>> command.
>>>>>>>> Because I can't figure out why the write function inside the
>>>>>>> elastic-indexer is not
>>>>>>>> getting executed.
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Regarding Indexing to elasticsearch

Reply via email to