Have you changed from the default MemStore gora storage to something else?

On Tuesday, June 25, 2013, Sznajder ForMailingList <[email protected]>
wrote:
> thanks Tejas
>
> Yes, I cheecked the logs and  no Error appears in them
>
> I let the http.content.limit and parser.html.impl with their default
> value...
>
> Benajmin
>
>
> On Tue, Jun 25, 2013 at 6:14 PM, Tejas Patil <[email protected]
>wrote:
>
>> Did you check the logs (NUTCH_HOME/logs/hadoop.log) for any exception or
>> error messages ?
>> Also you might have a look at these configs in nutch-site.xml (default
>> values are in nutch-default.xml):
>> http.content.limit and parser.html.impl
>>
>>
>> On Tue, Jun 25, 2013 at 7:04 AM, Sznajder ForMailingList <
>> [email protected]> wrote:
>>
>> > Hello
>> >
>> > I installed Nutch 2.2 on my linux machine.
>> >
>> > I defined the seed directory with one file containing:
>> > http://en.wikipedia.org/
>> > http://edition.cnn.com/
>> >
>> >
>> > I ran the following:
>> > sh bin/nutch inject ~/DataExplorerCrawl_gpfs/seed/
>> >
>> > After this step:
>> > the call
>> > -bash-4.1$ sh bin/nutch readdb -stats
>> >
>> > returns
>> > TOTAL urls:     2
>> > status 0 (null):        2
>> > avg score:      1.0
>> >
>> >
>> > Then, I ran the following:
>> > bin/nutch generate -topN 10
>> > bin/nutch fetch -all
>> > bin/nutch parse -all
>> > bin/nutch updatedb
>> > bin/nutch generate -topN 1000
>> > bin/nutch fetch -all
>> > bin/nutch parse -all
>> > bin/nutch updatedb
>> >
>> >
>> > However, the stats call after these steps is still:
>> > the call
>> > -bash-4.1$ sh bin/nutch readdb -stats
>> > status 5 (status_redir_perm):   1
>> > max score:      2.0
>> > TOTAL urls:     3
>> > avg score:      1.3333334
>> >
>> >
>> >
>> > Only 3 urls?!
>> > What do I miss?
>> >
>> > thanks
>> >
>> > Benjamin
>> >
>>
>

-- 
*Lewis*

Reply via email to