Hey,
Can anyone tell what could be the reason for following which happened while
fetching data using bin/nutch fetch:
My AVG Antivirus is detecting virus threats while Nutch fetches pages from
available urls of *crawldb.* I injected DMOZ Open Directory urls to crawldb.
Antivirus already detected
All-
Idea on how to configure nutch to generate/fetch on multiple machines
simultaneously?
-Gaurang
, etc?
Thanks Regards,
Gaurang Patel
All-
At any point of time, is there a way to know how many urls are there
in my *crawldb
*?
Regards,
Gaurang
Hey Andrzej,
Can you tell me where to set this property (generate.update.db)? I am trying
to run similar kind of crawl scenario that Eric is running.
-Gaurang
2009/10/5 Andrzej Bialecki a...@getopt.org
Eric wrote:
Andrzej,
Just to make sure I have this straight, set the generate.update.db
Hey Jack,
*One concern:*
I am not sure where can I get 0.1 billion page urls? I am using DMOZ Open
Directory(which has around 3M urls) to inject the crawldb.
Please help.
Regards,
Gaurang
2009/10/4 Jack Yu jackyu...@gmail.com
0.1 billion pages for 1.5TB
On 10/5/09, Gaurang Patel
Hey,
Never mind. I got *generate.update.db* in *nutch-default.xml* and set it
true.
Regards,
Gaurang
2009/10/5 Gaurang Patel gaurangtpa...@gmail.com
Hey Andrzej,
Can you tell me where to set this property (generate.update.db)? I am
trying to run similar kind of crawl scenario that Eric
All-
I am novice to using Nutch. Can anyone tell me the estimated size in (I
suppose, in TBs) that will be required to store the crawled results for
whole web? I want to get estimate of the memory requirements for my project,
that uses Nutch web crawler.
Regards,
Gaurang Patel
Thanks Jack.
This will help.
-Gaurang
2009/10/4 Jack Yu jackyu...@gmail.com
0.1 billion pages for 1.5TB
On 10/5/09, Gaurang Patel gaurangtpa...@gmail.com wrote:
All-
I am novice to using Nutch. Can anyone tell me the estimated size in (I
suppose, in TBs) that will be required
Hi All,*
*Can anyone help me with this problem?*
Here is my problem:*
I want to get the source code of the hits I get using nutch crawler. I am
not sure whether nutch stores the content of a web page(i.e actual source
code for web page) in the crawled results. I am afraid if it does not!
If
dynamically. In other words, Nutch does not store the source code into
crawled results.
Let me know if I am wrong.
-Gaurang
2009/5/11 Susam Pal susam@gmail.com
On Tue, May 12, 2009 at 8:50 AM, Gaurang Patel gaurangtpa...@gmail.com
wrote:
Hi All,*
*Can anyone help me with this problem
for helping me out anyways.
-Gaurang
2009/5/11 Susam Pal susam@gmail.com
On Tue, May 12, 2009 at 10:56 AM, Gaurang Patel gaurangtpa...@gmail.com
wrote:
Thanks Susam,
This worked perfectly for me. Thanks for reply.
*One more concern:*
Does this method fetch the contents(source code
trace of the root cause is available in the Apache
Tomcat/6.0.18 logs.*
--
Apache Tomcat/6.0.18
Not sure what is happening. Can anyone help me in this?
Regards,
Gaurang Patel
* *The full stack trace of the root cause is available in the Apache
Tomcat/6.0.18 logs.*
--
Apache Tomcat/6.0.18
Not sure what is happening. Can anyone help me in this?
Regards,
Gaurang Patel
14 matches
Mail list logo