date:20090723

Re: error in using generate command

2009-07-23 Thread Beats

hi.. i am not able to solve this problem Any Ideas??? -- View this message in context: http://www.nabble.com/error-in-using-generate-command-tp24545715p24621067.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: nutch -threads in hadoop

2009-07-23 Thread Andrzej Bialecki

Brian Tingle wrote: Hey, I'm playing around the nutch on hadoop; when I go hadoop jar nutch-1.0.job org.apache.nutch.crawl.Crawl -threads ... is that threads per node or total threads for all nodes? Threads per map task - if you run multiple map tasks per node then you will get

Re: error in using generate command

2009-07-23 Thread Doğacan Güney

On Thu, Jul 23, 2009 at 10:58, Beatstarun_agrawal...@yahoo.com wrote: hi.. i am not able to solve this problem Crawl command uses crawl-urlfilter.txt while inject/generate/etc. commands use other files (such as regex-urlfilter.txt). So you should check your filters. Any Ideas??? --

Re: error in using generate command

2009-07-23 Thread Alex McLintock

Why does your example say both monster.crawl and test.crawl ? Are you perhaps entering the command wrong or is this just an error in the email? Alex 2009/7/18 Beats tarun_agrawal...@yahoo.com: hi, i m getting this weird error ( at least for me): i m trying to crawl a some web pages..

Re: error in using generate command

2009-07-23 Thread Beats

Sorry for the error it is just typing error. thanx for replying alexmc wrote: Why does your example say both monster.crawl and test.crawl ? Are you perhaps entering the command wrong or is this just an error in the email? Alex 2009/7/18 Beats tarun_agrawal...@yahoo.com: hi, i

How to add new field in parseData

2009-07-23 Thread Saurabh Suman

hi In ParseData following fields are there: Version, Status ,Title, Outlinks.. I want to add new field like location. I will get this during parsing in Html Parser.How can i set new filelds so that is visible in parsedata. -- View this message in context:

Re: How to add new field in parseData

2009-07-23 Thread Doğacan Güney

You can add it to parse data's metadata. On Thu, Jul 23, 2009 at 13:38, Saurabh Sumansaurabhsuman...@rediff.com wrote: hi In ParseData following fields are there: Version, Status ,Title, Outlinks.. I want to add new field like location. I will get this during parsing in Html Parser.How

Pages with Specific URLS.

2009-07-23 Thread Zaihan

Hi All, I'm sure I've read somewhere before that URLs that is made like http://www.site.com/categories.asp?cid=25page=9 Can't be crawled. Is that true? Warmest Regards, Zaihan

Re: Pages with Specific URLS.

2009-07-23 Thread reinhard schwab

because? you mean urls which contain a query part? they can be crawled. the default nutch configuration excludes them by this filter rule in conf/crawl-urlfilter.txt # skip URLs containing certain characters as probable queries, etc. -[...@=] Zaihan schrieb: Hi All, I'm sure I've read

Re: Nutch 1.0 Fetch failure...

2009-07-23 Thread Fred Kuipers

Thanks for the pointer to the LocalFetchRecover tool. It seems there were some changes to the hadoop api since nutch 0.8.1 so this tool didn't work initially. I've made what I think are the correct changes and have attached my changes. (Hopefully the attachment gets through.) I put together a

Gracefull stop in the middle of a fetch phase ?

2009-07-23 Thread MilleBii

Hi guys, I'm in the middle of a very long fetch phase, too long actually. I would like to stop it but not loose 5 days of fetching. Is there anything I can do ? -- -MilleBii-

RE: nutch -threads in hadoop

2009-07-23 Thread Brian Tingle

Thanks, I eventually found where the job trackers were in the :50030 web page of the cloudera thing, and I saw it said 10 threads for each crawler in the little status update box where it was telling me how far along each crawl was. I have to say, this whole thing (nutch/hadoop) is pretty

Re: Gracefull stop in the middle of a fetch phase ?

2009-07-23 Thread Doğacan Güney

On Thu, Jul 23, 2009 at 21:29, MilleBiimille...@gmail.com wrote: Hi guys, I'm in the middle of a very long fetch phase, too long actually. I would like to stop it but not loose 5 days of fetching. Is there anything I can do ? No, unfortunately nutch 1.0 does not have that feature. But we

adding [-numFetchers numFetchers] to crawl

2009-07-23 Thread Brian Tingle

How do I set the number of Map tasks when I do a command like hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl ? I think I'm going to try out the change below, is there any reason not to do it, or is Crawl supposed to be more of a demo and I should write some script or my own

Re: error in using generate command

Re: nutch -threads in hadoop

Re: error in using generate command

Re: error in using generate command

Re: error in using generate command

How to add new field in parseData

Re: How to add new field in parseData

Pages with Specific URLS.

Re: Pages with Specific URLS.

Re: Nutch 1.0 Fetch failure...

Gracefull stop in the middle of a fetch phase ?

RE: nutch -threads in hadoop

Re: Gracefull stop in the middle of a fetch phase ?

adding [-numFetchers numFetchers] to crawl

14 matches

Site Navigation

Mail list logo

Footer information