Markus, Actually I started doing this after you replied to my queries few months ago , I did not face this issue when I was running Nutch in a local mode , seems like this issue shows up when running in a deploy(single node cluster) mode.
I will go ahead and change this to index the document in the FetcherOutputFormat class( would you please tell me the line number to insert th code at ?). However I was wondering if I would able to leverage the plugin mechanism to do this and if there is any Solr plugin that takes the parsed text from the URL and indexes it based on some transformation that I do ? I really appreciate your help . Thanks. On Mon, Dec 16, 2013 at 10:08 AM, Markus Jelsma <[email protected]>wrote: > You have modified the Fetcher to index documents? In that case, you should > index in the reducer (FetcherOutputFormat), not while mapping, and reuse > the existing indexing code of SolrWriter. In any case, you should not > create a client per document. > > -----Original message----- > > From:S.L <[email protected]> > > Sent: Monday 16th December 2013 15:57 > > To: [email protected]; [email protected] > > Subject: Re: Excessive HttpClient creation (Nutch 1.7 on Hadoop 2.2) > > > > Markus, > > > > > > Yes you are right FetcherThread does not use SolrJ by itself ,I am > adding a call to Solr to save the data. I am concerned about the number of > HttpClients being created,it.seems its creating a client per a document i > am saving in Solr,thiscould be expected but I just want to confirm. > > > > Thanks. > > > > Sent from my HTC Inspire⢠4G on AT&T > > > > ----- Reply message ----- > > From: "Markus Jelsma" <[email protected]> > > To: "[email protected]" <[email protected]> > > Subject: Excessive HttpClient creation (Nutch 1.7 on Hadoop 2.2) > > Date: Mon, Dec 16, 2013 6:27 am > > > > > > Hi - How can this be in FetcherThread, Nutch does not use SolrJ in > Fetcher. Do you have the entire Fetcher log? > > > > -----Original message----- > > > From:S.L <[email protected]> > > > Sent: Monday 16th December 2013 6:40 > > > To: [email protected] > > > Subject: Excessive HttpClient creation (Nutch 1.7 on Hadoop 2.2) > > > > > > Hi Folks, > > > > > > I am running Nutch 1.7 on Hadoop 2.2 and in the Hadoop logs for > > > FetcherThread, I see the following statements , which tells me that the > > > HttpCleints are being created per URL, is this correct assumption? Also > > > after a few fetches I also notice that the Hadoop job throws a OOM > error , > > > please advise. > > > > > > 2013-12-15 23:47:31,921 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:31,931 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:31,932 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,034 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,034 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,040 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,187 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,214 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,250 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > 2013-12-15 23:47:32,264 INFO [FetcherThread] > > > org.apache.solr.client.solrj.impl.HttpClientUtil: Creating new http > > > client, > config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > > > > > >

