Re: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread yangfeng
I sill want to know the reason. 2009/12/2 BELLINI ADAM mbel...@msn.com hi, anay idea guys ?? thanx From: mbel...@msn.com To: nutch-user@lucene.apache.org Subject: RE: recrawl.sh stopped at depth 7/10 without error Date: Fri, 27 Nov 2009 20:11:12 + hi, this is

Re: How to successfully crawl and index office 2007 documents in Nutch 1.0

2009-12-07 Thread yangfeng
docx should be parsed,A plugin can be used to Parsed docx file. you get some help info from parse-html plugin and so on. 2009/12/4 Rupesh Mankar rupesh_man...@persistent.co.in Hi, I am new to Nutch. I want to crawl and search office 2007 documents (.docx, .pptx etc) from Nutch. But when I

Nutch 1.0 wml plugin

2009-12-07 Thread yangfeng
I have completed the plugin for parsing the wml(wiredless mark language). I hope to add it to lucene, what i do?

Fetched links contain html

2009-12-07 Thread Kirk Gillock
Hello fellow Nutch users, In a few days we'll start crawling a long list of Thai websites. With previous crawls we noticed there were A LOT of poorly formatted html pages and the crawler would sometimes fetch links that contain html code (ex: http://www.website.com/news/index.php/ul ). How

Re: Nutch 1.0 wml plugin

2009-12-07 Thread Andrzej Bialecki
yangfeng wrote: I have completed the plugin for parsing the wml(wiredless mark language). I hope to add it to lucene, what i do? The best long-term option would be to submit this work to the Tika project - see http://lucene.apache.org/tika/. If you already implemented this as a Nutch

RE: How to successfully crawl and index office 2007 documents in Nutch 1.0

2009-12-07 Thread Rupesh Mankar
Is there any readymade plug-in for office 2007 documents available or I have to write it by my own? -Original Message- From: yangfeng [mailto:yea...@gmail.com] Sent: Monday, December 07, 2009 4:35 PM To: nutch-user@lucene.apache.org Subject: Re: How to successfully crawl and index

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread BELLINI ADAM
hi, mabe i found my probleme, it's not nutch mistake, i beleived when running the crawl command as background process when closing my console it will not stop the process, but it seems that it realy kill the process i launched the porcess like this : ./bin/nutch crawl urls -dir crawl

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread Paul Tomblin
Try starting it with nohup.  'man nohup' for details. -- Sent from my Palm Prē BELLINI ADAM wrote: hi, mabe i found my probleme, it's not nutch mistake, i beleived when running the crawl command as background process when closing my console it will not stop the process, but it seems

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread BELLINI ADAM
i fixed it by putting it in crontab and now i can sleep without thinking of it :) thx u very much Date: Mon, 7 Dec 2009 12:03:25 -0500 From: ptomb...@gmail.com To: nutch-user@lucene.apache.org Subject: RE: recrawl.sh stopped at depth 7/10 without error Try starting it with nohup.

OR support

2009-12-07 Thread BrunoWL
Hi! Did anybody added the search with or operator in the nutch1.0 successfully? i found a patch for the 0.9 version, but doesn't work. thanks. -- View this message in context: http://old.nabble.com/OR-support-tp26680899p26680899.html Sent from the Nutch - User mailing list archive at

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread Fuad Efendi
crawl.log 21 You forgot 21... output for errors... Also, you need to close _politely_ the SSH session by executing exit. Without it, it pipe is broken, OS will kill the process. Fuad Efendi +1 416-993-2060 http://www.tokenizer.ca Data Mining, Vertical Search -Original Message-

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread BELLINI ADAM
thx fuad for the info...yes i was just closing my laptop without exting the ssh session. but now i hv it running form my cron and it didnt stop :) thx again From: f...@efendi.ca To: nutch-user@lucene.apache.org Subject: RE: recrawl.sh stopped at depth 7/10 without error Date: Mon, 7 Dec

Re: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread MilleBii
Another an alternative to crontab, I use nohup command to get my jobs running. 2009/12/7, BELLINI ADAM mbel...@msn.com: thx fuad for the info...yes i was just closing my laptop without exting the ssh session. but now i hv it running form my cron and it didnt stop :) thx again From:

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-07 Thread BELLINI ADAM
yes i'v just tested nohup and it works :) thx to all Date: Mon, 7 Dec 2009 19:26:42 +0100 Subject: Re: recrawl.sh stopped at depth 7/10 without error From: mille...@gmail.com To: nutch-user@lucene.apache.org Another an alternative to crontab, I use nohup command to get my jobs