Am 03.07.2013 23:12, schrieb Sebastian Nagel:
with Nutch 1.7 and Solr 3.6.2 and same for [1], [2]
the digest field appears and is filled well.
Ok, there is no further configuration neccessary? Nutch will generate
this out of the box?
Just tested it with a fresh compiled 1.7 - no digest once
Hi Christian,
with Nutch 1.7 and Solr 3.6.2 and same for [1], [2]
the digest field appears and is filled well.
Sebastian
On 07/03/2013 09:14 AM, Christian Nölle wrote:
> Am 02.07.2013 22:29, schrieb Sebastian Nagel:
>
>>> no field "digest" showing up in the indexchecker
>> That's correct to som
How many different hosts do you crawl? I see one reducer and only one queue and
Nutch queus by domain or host. Hosts will always end up in the same queue so
Nutch will only crawl a lot and very fast if there's a large number of queues
to process.
The only thing you can do then is increase the
Hi,
I reran this job again. I had 5 urls in my seed, and first pass of fetch,
fetched about 230 pages in 20 minutes.
Then I ran a second pass of fetch, and it has been running over 3.5 hours.
Again, it is still the 1 reducer doing all the work, and its jobtracker has
nothing in its log yet.
20/20
Have to looked at http://wiki.apache.org/nutch/RunNutchInEclipse ?
This is recently been updated and worked for several people over the
user-group. It has some cool screen shots which would make your life easy
setting up Nutch with eclipse.
On Wed, Jul 3, 2013 at 12:39 AM, Ramakrishna wrote:
> G
The steps you performed are right.
Did you get the log for that one "hardworking" reducer ? It will hint us
why the job took so much. Ideally you should get logs for every job and its
attempts. If you cannot get the log for that reducer, then I feel that your
cluster is having some problem and thi
The correct order is:
inject
loop
generate
fetch
parse
updatedb
end loop
solr
The nutch tutorial [0] and the crawl script are using the same.
[0] : http://wiki.apache.org/nutch/NutchTutorial
[1] : http://svn.apache.org/viewvc/nutch/trunk/src/bin/crawl?view=markup
On Wed, Jul 3, 2013 at
Hi Tejas, looks like we were tying at the same time
So anyway, my job ended fine, just to be sure what I am doing is right, I
have cleared the db and started another round again. If I stumble again,
will respond back on this thread.
On Wed, Jul 3, 2013 at 8:43 AM, Tejas Patil wrote:
> > The seco
On most documents and email list, I have seen that the order of crawl for
nutch-solr is
inject
loop
generate
fetch
updatedb
parse
end loop
solr
When I follow this path I always see solr has 0 docs, even if i run solr
inside the loop, i still get 0 docs in solr.
However, if I switch the o
Guys.. I'm extremely sorry for posting/asking same doubt again.. After
reading many documents also i dint get how to integrate nutch and eclipse.
I've apache-nutch-2.2 and eclipse-juno versions are there. Plz tel me step
by step, how to integrate eclipse and nutch with documentation. If possible
pl
Spoke too soon, the fetch completed in 21 min.
On Wed, Jul 3, 2013 at 8:32 AM, h b wrote:
> oh and yes, generate.max.count is set to 5000
>
>
> On Wed, Jul 3, 2013 at 8:29 AM, h b wrote:
>
>> I dropped my webpage database, restarted with 5 seed urls. First fetch
>> completed in a few seconds.
> The second run, still shows 1 reduce running, although it shows as 100%
complete, so my thought is it is writing out to the disk, though it has
been about 30+ minutes.
> This one reducers log on the jobtracker however, is empty.
This is weird. There can be a explanation for first line: The data
oh and yes, generate.max.count is set to 5000
On Wed, Jul 3, 2013 at 8:29 AM, h b wrote:
> I dropped my webpage database, restarted with 5 seed urls. First fetch
> completed in a few seconds. The second run, still shows 1 reduce running,
> although it shows as 100% complete, so my thought is it
I dropped my webpage database, restarted with 5 seed urls. First fetch
completed in a few seconds. The second run, still shows 1 reduce running,
although it shows as 100% complete, so my thought is it is writing out to
the disk, though it has been about 30+ minutes.
Again, I had 80 reducers, when I
Please look for mapred-site.xml in hadoop conf directory. you can specify
mapred.reduce.tasks and set an int for this value
You will need to restart the jobtracker for this to kickin I would imagine.
On Wednesday, July 3, 2013, Sznajder ForMailingList <
bs4mailingl...@gmail.com> wrote:
> Hi
>
> Wh
Hi
When running Nutch in distributed mode, I see on my Hadoop jobtracker (
http://host:50300/jobtracker.jsp )
I see only 2 mappers running
I wanted to set the number of mappers to the number of nodes I have on the
cluster (6).
Looking in the wiki, I found documents speaking about hadoop-site
Great news, thanks Lewis!
-Original message-
From: Lewis John Mcgibbney
Sent: Tuesday 2nd July 2013 18:32
To: user@nutch.apache.org; d...@nutch.apache.org
Subject: [ANNOUNCE] Apache Nutch v2.2.1 Released
Good Afternoon Everyone,
The Apache Nutch PMC are very pleased to announce the immed
Am 02.07.2013 22:29, schrieb Sebastian Nagel:
Which Solr version is used?
Sorry, forgot about that bit:
3.6.1
--
-c
Am 02.07.2013 22:29, schrieb Sebastian Nagel:
no field "digest" showing up in the indexchecker
That's correct to some extend. The class of indexchecker is called
IndexingFiltersChecker and it shows the fields added by the
configured IndexingFilters. The field digest is added as a field by
the c
19 matches
Mail list logo