date:20130214

Re: Very long time just before fetching and just after parsing

2013-02-14 Thread kemical

HI, i didn't managed to run Invertlinks and solrindex command only for some segments since it seems those command works only for segments parent dir. Then i've made a little change to my fetch/parse/update/index loop. *In short:* I generate new segments in an empty "current_segments" dir. When th

Re: Nutch Incremental Crawl

2013-02-14 Thread kemical

Hi David, You can also consider setting shorter fetch interval time with nutch inject. This way you'll set higher score (so the url is always taken in priority when you generate a segment) and a fetch.interval of 1 day. If you have a case similar to me, you'll often want some homepage fetch each

RE: Nutch Incremental Crawl

2013-02-14 Thread Markus Jelsma

If you want records to be fetched at a fixed interval its easier to inject them with a fixed fetch interval. nutch.fixedFetchInterval=86400 -Original message- > From:kemical > Sent: Thu 14-Feb-2013 10:15 > To: user@nutch.apache.org > Subject: Re: Nutch Incremental Crawl > > Hi Davi

Nutch 2.1 over Hadoop 1.0.3 and HBase 0.94.2

2013-02-14 Thread Amit Sela

Hi everyone, I'm new to Nutch and I would appreciate some advice... I want to use Nutch to Crawl over urls and categorize them. I already have a running Hadoop cluster with Hadoop 1.0.3 and HBase 0.94.2, and I saw that Nutch 2.1 with Gora supports HBase as backend. I would like to start by runn

Re: Nutch 2.1 over Hadoop 1.0.3 and HBase 0.94.2

2013-02-14 Thread Lewis John Mcgibbney

Hi Amit, On Thu, Feb 14, 2013 at 6:24 AM, Amit Sela wrote: > > I already have a running Hadoop cluster with Hadoop 1.0.3 and HBase 0.94.2, > and I saw that Nutch 2.1 with Gora supports HBase as backend. > First thing's first. We cannot guarantee that Gora and subsequently Nutch will work with t

fields in solrindex-mapping.xml

2013-02-14 Thread alxsss

Hello, I see that there are fields in addition to title, host and content ones in nutch-2.x' solr-mapping.xml. I thought tstamp may be needed for sorting documents. What about the other fields, segment, boost and digest? Can

Re: fields in solrindex-mapping.xml

2013-02-14 Thread Lewis John Mcgibbney

Hi Alex, Tstamp represents fetch tiem, used for deduplication. Boost is for scoring-opic and link. This is required in 2.x as well. I don't have the code right now, but you can try removing digest and segment. To me they both look legacy. There is a wiki page on index structure which you can consul

Nutch 2.1 different batch id (null)

2013-02-14 Thread Dragan Menoski

Hi, I try to set Nutch 2.1 and Solr 4.0 with MySQL database, according to the instruction in this link: http://nlp.solutions.asia/?p=180. I made same changes in conf/nutch-site.xml (set threads to 50). When I start crawl (path: ~/Desktop/apache-nutch-2.1/runtime/local, command: bin/nutch crawl u

Re: Very long time just before fetching and just after parsing

Re: Nutch Incremental Crawl

RE: Nutch Incremental Crawl

Nutch 2.1 over Hadoop 1.0.3 and HBase 0.94.2

Re: Nutch 2.1 over Hadoop 1.0.3 and HBase 0.94.2

fields in solrindex-mapping.xml

Re: fields in solrindex-mapping.xml

Nutch 2.1 different batch id (null)

8 matches

Site Navigation

Mail list logo

Footer information