make the
problem.
Any way as you mention removing teh jr is not enough, I still have the same
problem.
Thanks again,
Rafit
From: Mike Smith <[EMAIL PROTECTED]>
Reply-To: nutch-user@lucene.apache.org
To: nutch-user@lucene.apache.org
Subject: Re: Problems with MapRed-
Date: Wed, 1 Feb 20
Hi Andrzej
I repeated the crawl with plugged JS parser and problem happeded again, but
by removing JS parser everything goes smoothly. I am using a single machine
and verything is running locally but using ndfs. Have you tried that URL to
see if you can crawl that for depth 2? in the tasktracker l
Mike Smith wrote:
I finally find out why this problem happens, there should be a problem with
the JS parser. Because I used this:
plugin.includes
protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)
instead of the default one which has JS in it and I could fetch
h
I have a group of 5000 that
> > > when I
> > > Run them have this problem, I am continuing with this problem till
> > > I'll find
> > > the smallest group that has this problem and let you know about the
> > > seed.
> > >
> > > T
that has this problem and let you know about the
> > seed.
> >
> > Thanks,
> > Rafit
> >
> >
> >
> > >From: Ken Krugler < [EMAIL PROTECTED]>
> > >Reply-To: nutch-user@lucene.apache.org
> > >To: nutch-user@lucene.apache.org
> &
the smallest group that has this problem and let you know about the seed.
>
> Thanks,
> Rafit
>
>
>
> >From: Ken Krugler <[EMAIL PROTECTED]>
> >Reply-To: nutch-user@lucene.apache.org
> >To: nutch-user@lucene.apache.org
> >Subject: Re: Problems with Ma
ED]>
Reply-To: nutch-user@lucene.apache.org
To: nutch-user@lucene.apache.org
Subject: Re: Problems with MapRed-
Date: Sun, 29 Jan 2006 16:42:15 -0800
This looks like the namenode has lost connection to one of the datanodes.
The default number of replications in ndfs is 3 and it seems like the
nam
ED]>
Reply-To: nutch-user@lucene.apache.org
To: nutch-user@lucene.apache.org
Subject: Re: Problems with MapRed-
Date: Sun, 29 Jan 2006 16:42:15 -0800
This looks like the namenode has lost connection to one of the datanodes.
The default number of replications in ndfs is 3 and it seems like the
nam
This looks like the namenode has lost connection to one of the
datanodes. The default number of replications in ndfs is 3 and it
seems like the namenode has only 2 in its list so it logs this
warning. As Stefan suggested, you should check the diskspace on your
machines. If I recall correctly da
This looks like the namenode has lost connection to one of the
datanodes. The default number of replications in ndfs is 3 and it seems
like the namenode has only 2 in its list so it logs this warning. As
Stefan suggested, you should check the diskspace on your machines. If I
recall correctly da
Hi Mike,
I forgot to mention the namenode log file gives me thousands of these:
060129 13 Zero targets found, forbidden1.size=2allowSameHostTargets=false
forbidden2.size()=0
060129 13 Zero targets found, forbidden1.size=2allowSameHostTargets=false
forbidden2.size()=0
From our experien
may the hdds are full?
try:
bin/nutch ndfs -report
Nutch generates some temporarily data until processing.
Am 30.01.2006 um 00:54 schrieb Mike Smith:
I forgot to mention the namenode log file gives me thousands of these:
060129 13 Zero targets found,
forbidden1.size=2allowSameHostTargets
Am 30.01.2006 um 00:50 schrieb Mike Smith:
I do have the same problem and this problem is killing. I have
tried all
sort of comfiguration and tricks.
I have 3 machines, all three are datanodes and 1 is jobtracker. It
3 tasktracker, 1 jobtracker, 3 datanodes and 1 namenode, right?
successfu
I forgot to mention the namenode log file gives me thousands of these:
060129 13 Zero targets found, forbidden1.size=2allowSameHostTargets=false
forbidden2.size()=0
060129 13 Zero targets found, forbidden1.size=2allowSameHostTargets=false
forbidden2.size()=0
Thanks, Mike
On 1/29/06, Mik
I do have the same problem and this problem is killing. I have tried all
sort of comfiguration and tricks.
I have 3 machines, all three are datanodes and 1 is jobtracker. It
successfully fetches 300,000 pages, but when I try to fetch more than that
by injecting more number of pages at the first cy
Sounds like your tasktracker wasn't able to connect to your
jobtracker and more.
Are you sure the jobtracker still runs and the tasktracker can access
the jobtracker box still under same hostname?
Am 28.01.2006 um 21:21 schrieb Rafit Izhak_Ratzin:
Hi,
I ran the mapreduce starting with 10 U
Hi,
I ran the mapreduce starting with 10 URL into the sixth cycle where it
fetched 400K pages and everything was fine.
060127 001055 TOTAL urls: 1877326
060127 001055 avg score:1.099
060127 001055 max score:1666.305
060127 001055 min score:1.0
060127 001055 retry
17 matches
Mail list logo