I sill want to know the reason.
2009/12/2 BELLINI ADAM mbel...@msn.com
hi,
anay idea guys ??
thanx
From: mbel...@msn.com
To: nutch-user@lucene.apache.org
Subject: RE: recrawl.sh stopped at depth 7/10 without error
Date: Fri, 27 Nov 2009 20:11:12 +
hi,
this is
docx should be parsed,A plugin can be used to Parsed docx file. you get some
help info from parse-html plugin and so on.
2009/12/4 Rupesh Mankar rupesh_man...@persistent.co.in
Hi,
I am new to Nutch. I want to crawl and search office 2007 documents (.docx,
.pptx etc) from Nutch. But when I
I have completed the plugin for parsing the wml(wiredless mark language). I
hope to add it to lucene, what i do?
Hello fellow Nutch users,
In a few days we'll start crawling a long list of Thai websites. With
previous crawls we noticed there were A LOT of poorly formatted html
pages and the crawler would sometimes fetch links that contain html code
(ex: http://www.website.com/news/index.php/ul ). How
yangfeng wrote:
I have completed the plugin for parsing the wml(wiredless mark language). I
hope to add it to lucene, what i do?
The best long-term option would be to submit this work to the Tika
project - see http://lucene.apache.org/tika/. If you already implemented
this as a Nutch
Is there any readymade plug-in for office 2007 documents available or I have to
write it by my own?
-Original Message-
From: yangfeng [mailto:yea...@gmail.com]
Sent: Monday, December 07, 2009 4:35 PM
To: nutch-user@lucene.apache.org
Subject: Re: How to successfully crawl and index
hi,
mabe i found my probleme, it's not nutch mistake, i beleived when running the
crawl command as background process when closing my console it will not stop
the process, but it seems that it realy kill the process
i launched the porcess like this : ./bin/nutch crawl urls -dir crawl
Try starting it with nohup. 'man nohup' for details.
-- Sent from my Palm Prē
BELLINI ADAM wrote:
hi,
mabe i found my probleme, it's not nutch mistake, i beleived when running the
crawl command as background process when closing my console it will not stop
the process, but it seems
i fixed it by putting it in crontab and now i can sleep without thinking of it
:)
thx u very much
Date: Mon, 7 Dec 2009 12:03:25 -0500
From: ptomb...@gmail.com
To: nutch-user@lucene.apache.org
Subject: RE: recrawl.sh stopped at depth 7/10 without error
Try starting it with nohup.
Hi!
Did anybody added the search with or operator in the nutch1.0
successfully?
i found a patch for the 0.9 version, but doesn't work.
thanks.
--
View this message in context:
http://old.nabble.com/OR-support-tp26680899p26680899.html
Sent from the Nutch - User mailing list archive at
crawl.log 21
You forgot 21... output for errors...
Also, you need to close _politely_ the SSH session by executing exit.
Without it, it pipe is broken, OS will kill the process.
Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca
Data Mining, Vertical Search
-Original Message-
thx fuad for the info...yes i was just closing my laptop without exting the ssh
session.
but now i hv it running form my cron and it didnt stop :)
thx again
From: f...@efendi.ca
To: nutch-user@lucene.apache.org
Subject: RE: recrawl.sh stopped at depth 7/10 without error
Date: Mon, 7 Dec
Another an alternative to crontab, I use nohup command to get my jobs running.
2009/12/7, BELLINI ADAM mbel...@msn.com:
thx fuad for the info...yes i was just closing my laptop without exting the
ssh session.
but now i hv it running form my cron and it didnt stop :)
thx again
From:
yes i'v just tested nohup and it works :)
thx to all
Date: Mon, 7 Dec 2009 19:26:42 +0100
Subject: Re: recrawl.sh stopped at depth 7/10 without error
From: mille...@gmail.com
To: nutch-user@lucene.apache.org
Another an alternative to crontab, I use nohup command to get my jobs
14 matches
Mail list logo