Hi,
I have a list of around 1000 seed URLS, which I crawl till depth=2 or 3.
This is done on a local machine having a configuration(having no other
large resource consuming processes running) :
Dual Core (2.4 GHz),
4GB Ram
It takes around 14-15 hours to crawl this seedlist, which generates
See http://*wiki*.apache.org/*nutch*/OptimizingCrawls for a checklist
On 21 February 2012 10:47, Bharat Goyal bharat.go...@shiksha.com wrote:
No of fetcher threads is equal to default value(10), What is the optimum
value for no of threads? Also, the fetching and parsing are not seperate.
so I am getting this error while running solrindex :
org.apache.solr.common.SolrException:
ERROR_httpwww2moderncomsitegiftregistryhtml_multiple_values_encountered_for_non_multiValued_field_title_2Modern_Gift_Registry_giftregistryhtml
Hi markus
I was searching about this issue and I saw that had the same problem
before. how did you solve yours?
this is the error that I am getting when I run solrindex:
org.apache.solr.common.SolrException:
Any suggestions?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764865.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Update DB was done, after inject, generate, fetch and parse.
Tried iterating after doing the update.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764994.html
Sent from the Nutch - User mailing list archive at
I've downloaded the sources and compiled them myself.
Both protocol-http and protocol-httpclient (with basic auth) are working
like a charm now.
Thx for the help!
T
--
View this message in context:
http://lucene.472066.n3.nabble.com/Failed-fetching-tp3710422p3765295.html
Sent from the Nutch
Hi apachenutch,
Something of a wild guess here. Given that you are using the same seed file as
I am, I would have expected to see a single URL in the index at the end of the
first iteration, not 10. So the only time I have observed similar behavior was
when the fetcher truncated the file
Hi,
I need to move the SOLR based search platform to a distributed setup, and
therefore need to be able to write to multiple SOLR servers from Nutch (working
on the nutchgora branch, so this may be specific to this branch). Here is what
I think I need to do...
Currently, SolrIndexerJob writes
Hi Sujit,
Sounds good. A nice way of doing it would be to make so that people can
define how to partition over the SOLR instances in the way they want e.g.
consistent hashing, URL range or crawldb metadata by taking a class name as
parameter. Does not need to be pluggable I think. I had
change the line in schema,xml from :
field name=title type=text stored=false indexed=true
termVectors=true multiValued=false/
to:
field name=title type=text stored=false indexed=true
termVectors=true multiValued=true/
it is unusual to have multiple titles in a webpage, can you provide
the url
11 matches
Mail list logo