Hi Avilash,
It is extremely difficult to comment here.
We need information on whats actually happening. Your description is a bit
of a black box. Can you please look in hadoop.log and solr logs as well.
THIS WIll give you an indication of how many documents are/were written
down to Solr.
thank you
Hi,
I followed the tutorial on the Nutch Website.
I am using Nutch 1.6 with Solr 3.6.
Everything went well till the end but when I searched passed a query.
It gave me no results.
Need help.
Thanx and regards
Avilash
Hi,
On Tue, Jul 2, 2013 at 3:53 PM, h b wrote:
> So, I tried this with the generate.max.count property set to 5000, rebuild
> ant; ant jar; ant job and reran fetch.
> It still appears the same, first 79 reducers zip through and the last one
> is crawling, literally...
>
Sorry I should have been
So, I tried this with the generate.max.count property set to 5000, rebuild
ant; ant jar; ant job and reran fetch.
It still appears the same, first 79 reducers zip through and the last one
is crawling, literally...
As for the logs, I mentioned on one of my earlier threads that when I run
from the d
Hi,
Please try
*http://s.apache.org/mo*
Specifically the generate.max.count property.
Many many URLs are unfetched here... look into the logs and see what is
going on. This is really quite bad and there is most likely one/a small
number of reasons which ultimately determine why so many URLs are unf
Hi,
I seeded 4 urls, all in the same domain.
I am running fetch with 20 threads and 80 numTasks. The reducer is stuck on
the last reduce.
I ran a dump of the readdb to see the status, and I see 122K of the total
133K urls are 'status_unfetched'. This is after 12 hours. The delay between
fetches is
I've got a version of the indexchecker that does that, as well as providing a
telnet server. I was just thinking to open an issue about that this afternoon!
-Original message-
> From:Sebastian Nagel
> Sent: Tuesday 2nd July 2013 22:29
> To: user@nutch.apache.org
> Subject: Re: no dige
Hi Christian,
> no field "digest" showing up in the indexchecker
That's correct to some extend. The class of indexchecker
is called IndexingFiltersChecker and it shows the fields
added by the configured IndexingFilters. The field digest
is added as a field by the class IndexerMapReduce. The digest
Great stuff! Thanks Lewis
On 2 July 2013 17:32, Lewis John Mcgibbney wrote:
> Good Afternoon Everyone,
>
> The Apache Nutch PMC are very pleased to announce the immediate release of
> Apache Nutch v2.2.1, we advise all current users and developers of the 2.X
> series to upgrade to this release A
Neither. You leave it in $NUTCH/conf and compile a job file with 'ant job'
which gets used from runtime/deploy/bin
BTW new users should at least do the basic Hadoop tutorial.
On 2 July 2013 16:23, Lewis John Mcgibbney wrote:
> I'm assuming that if your running on an established hadoop cluster y
You can decrease fetcher.server.delay. Another way is to split storage table
and run many instances of nutch. However, if you do not own the server where
the crawled domain hosted you could be blocked, since frequent requests might
be accepted as a Dos attack.
hth.
Alex.
-Origina
Good Afternoon Everyone,
The Apache Nutch PMC are very pleased to announce the immediate release of
Apache Nutch v2.2.1, we advise all current users and developers of the 2.X
series to upgrade to this release ASAP.
Apache Nutch is an open source web-search software project. Stemming
from Apache
L
Sorry team, this should have been a [RESULT] thread.
Thanks
Lewis
On Tue, Jul 2, 2013 at 9:08 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> In the famous words of Truman good morning, good afternoon, good evening
> and good night... to all Nutch'ers!!!
>
> I would like to bring
In the famous words of Truman good morning, good afternoon, good evening
and good night... to all Nutch'ers!!!
I would like to bring this thread to a close and formally end the VOTE'ing.
VOTE's tally as follows
[ ] +1, let's get it released!!!
Markus Jelsma
Tejas Patil
Chris A Mattmann
Feng Lu
R
Am 02.07.2013 17:19, schrieb Lewis John Mcgibbney:
Which version of Nutch are you using please?
Sorry, totally forgot to mention. Tested with 1.5.1 and 1.7
-- -c
I'm assuming that if your running on an established hadoop cluster you will
wish to keep it over there.
Lewis
On Tuesday, July 2, 2013, Sznajder ForMailingList
wrote:
> Thanks
> Can I copy this file to my $NUTCH/conf directory, or must I keep it in the
> $HADOOP/conf directory?
>
> Benjamin
>
>
>
Which version of Nutch are you using please?
On Tuesday, July 2, 2013, Christian Nölle wrote:
> Hi everbody,
>
> I got a problem concering solrdedup. We got a field digest in solr,
solrindex-mapping for digest is fine as well, but there is no field
"digest" showing up in the indexchecker and thus
Thanks
Can I copy this file to my $NUTCH/conf directory, or must I keep it in the
$HADOOP/conf directory?
Benjamin
On Tue, Jul 2, 2013 at 5:10 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> in mapred-site.xml
> It is your Mapreduce configuration override.
> hth
>
> On Tuesday, J
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList
wrote:
> Thanks a lot Markus!
>
> Where do we define this parameter, please?
>
> Benjamin
>
>
> On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma wrote:
>
>> Hi,
>>
>> Increase your m
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList
wrote:
> Thanks a lot Markus!
>
> Where do we define this parameter, please?
>
> Benjamin
>
>
> On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma wrote:
>
>> Hi,
>>
>> Increase your m
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList
wrote:
> Thanks a lot Markus!
>
> Where do we define this parameter, please?
>
> Benjamin
>
>
> On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma wrote:
>
>> Hi,
>>
>> Increase your m
Thanks a lot Markus!
Where do we define this parameter, please?
Benjamin
On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma wrote:
> Hi,
>
> Increase your memory in the task trackers by setting your Xmx in
> mapred.map.child.java.opts.
>
> Cheers
>
>
>
> -Original message-
> > From:Sznajder
Hi,
Increase your memory in the task trackers by setting your Xmx in
mapred.map.child.java.opts.
Cheers
-Original message-
> From:Sznajder ForMailingList
> Sent: Tuesday 2nd July 2013 15:25
> To: user@nutch.apache.org
> Subject: Distributed mode and java/lang/OutOfMemoryError
>
>
Hi,
I am running Nutch 1.7 on a cluster of 6 nodes.
I tempted to launch the bin/crawl script in this configuration and I am
getting a very strange error (an error I did not get in the local mode):
13/07/02 16:04:23 INFO fetcher.Fetcher: Fetcher Timelimit set for :
1372781063368
13/07/02 16:04:2
Hi,
Nutch can easily scale to many many billions of records, it just depends on how
many and how powerful your nodes are. Crawl speed is not very relevant as it is
always very fast, the problem usually is updating the databases. If you spread
your data over more machines you will increase your
Hi everbody,
I got a problem concering solrdedup. We got a field digest in solr,
solrindex-mapping for digest is fine as well, but there is no field
"digest" showing up in the indexchecker and thus not in our solr, when
performing a real crawl.
Is there anything missing? Are we missing a cru
26 matches
Mail list logo