Re: error in using generate command

2009-07-23 Thread Beats
hi.. i am not able to solve this problem Any Ideas??? -- View this message in context: http://www.nabble.com/error-in-using-generate-command-tp24545715p24621067.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: nutch -threads in hadoop

2009-07-23 Thread Andrzej Bialecki
Brian Tingle wrote: Hey, I'm playing around the nutch on hadoop; when I go hadoop jar nutch-1.0.job org.apache.nutch.crawl.Crawl -threads ... is that threads per node or total threads for all nodes? Threads per map task - if you run multiple map tasks per node then you will get

Re: error in using generate command

2009-07-23 Thread Doğacan Güney
On Thu, Jul 23, 2009 at 10:58, Beatstarun_agrawal...@yahoo.com wrote: hi.. i am not able to solve this problem Crawl command uses crawl-urlfilter.txt while inject/generate/etc. commands use other files (such as regex-urlfilter.txt). So you should check your filters. Any Ideas??? --

Re: error in using generate command

2009-07-23 Thread Alex McLintock
Why does your example say both monster.crawl and test.crawl ? Are you perhaps entering the command wrong or is this just an error in the email? Alex 2009/7/18 Beats tarun_agrawal...@yahoo.com: hi, i m getting this weird error ( at least for me): i m trying to crawl a some web pages..

Re: error in using generate command

2009-07-23 Thread Beats
Sorry for the error it is just typing error. thanx for replying alexmc wrote: Why does your example say both monster.crawl and test.crawl ? Are you perhaps entering the command wrong or is this just an error in the email? Alex 2009/7/18 Beats tarun_agrawal...@yahoo.com: hi, i

How to add new field in parseData

2009-07-23 Thread Saurabh Suman
hi In ParseData following fields are there: Version, Status ,Title, Outlinks.. I want to add new field like location. I will get this during parsing in Html Parser.How can i set new filelds so that is visible in parsedata. -- View this message in context:

Re: How to add new field in parseData

2009-07-23 Thread Doğacan Güney
You can add it to parse data's metadata. On Thu, Jul 23, 2009 at 13:38, Saurabh Sumansaurabhsuman...@rediff.com wrote: hi  In  ParseData following fields are there: Version, Status ,Title, Outlinks..  I want to add new field like location. I will get this during parsing in Html Parser.How

Pages with Specific URLS.

2009-07-23 Thread Zaihan
Hi All, I'm sure I've read somewhere before that URLs that is made like http://www.site.com/categories.asp?cid=25page=9 Can't be crawled. Is that true? Warmest Regards, Zaihan

Re: Pages with Specific URLS.

2009-07-23 Thread reinhard schwab
because? you mean urls which contain a query part? they can be crawled. the default nutch configuration excludes them by this filter rule in conf/crawl-urlfilter.txt # skip URLs containing certain characters as probable queries, etc. -[...@=] Zaihan schrieb: Hi All, I'm sure I've read

Re: Nutch 1.0 Fetch failure...

2009-07-23 Thread Fred Kuipers
Thanks for the pointer to the LocalFetchRecover tool. It seems there were some changes to the hadoop api since nutch 0.8.1 so this tool didn't work initially. I've made what I think are the correct changes and have attached my changes. (Hopefully the attachment gets through.) I put together a

Gracefull stop in the middle of a fetch phase ?

2009-07-23 Thread MilleBii
Hi guys, I'm in the middle of a very long fetch phase, too long actually. I would like to stop it but not loose 5 days of fetching. Is there anything I can do ? -- -MilleBii-

RE: nutch -threads in hadoop

2009-07-23 Thread Brian Tingle
Thanks, I eventually found where the job trackers were in the :50030 web page of the cloudera thing, and I saw it said 10 threads for each crawler in the little status update box where it was telling me how far along each crawl was. I have to say, this whole thing (nutch/hadoop) is pretty

Re: Gracefull stop in the middle of a fetch phase ?

2009-07-23 Thread Doğacan Güney
On Thu, Jul 23, 2009 at 21:29, MilleBiimille...@gmail.com wrote: Hi guys, I'm in the middle of a very long fetch phase, too long actually. I would like to stop it but not loose 5 days of fetching. Is there anything I can do ? No, unfortunately nutch 1.0 does not have that feature. But we

adding [-numFetchers numFetchers] to crawl

2009-07-23 Thread Brian Tingle
How do I set the number of Map tasks when I do a command like hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl ? I think I'm going to try out the change below, is there any reason not to do it, or is Crawl supposed to be more of a demo and I should write some script or my own