When I check over the system with the admin/manager functions of Tomcat
everything seems fine (there was definitely something wrong previously)
but I still have an issue trying to run the nutch.war file... So
checking around logs etc. to find a hint and not sure where to take this
now:
3-Apr-06
Hi
I have been raising this point for quite a time
Right now when we have a new job, we store the job.jar and job.xml files in
the job tracker. The task tracker if i am right uses this job.jar and
job.xml files
Should'nt we clean up after the job has been complete( that is purge these
files).
Hi,
I noticed that when I used the drive designation that it didn't like that
(windows cygwin environment) if you did
./nutch merge -local /STG1/index /STG1/indexes that may work better, let me
know.
Cheers/r/dan
-Original Message-
From: Vertical Search [mailto:[EMAIL PROTECTED]
Hi gurus,
I ran into similar issue as
http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04073.html.
I just could not get index merging to work. I've browsed the mailing
archives the tried many things
and nothing worked so far. Does 0.8 support merging indexes?
I really appreciate
Hi,
I'm not sure what you are doing so I will just describe hat I'm doing and
maybe you would find the answer :)
Let's make some assumptions:
1. main nutch dfs is: /user/nutchuser
1.1 you should have /user/nutchuser/crawldb
1.2 you should have /user/nutchuser/segments with some fetched segments
For example, I want to crawl 20,000 pages everyday for 10 days and then
merge
the data for search. So far, I can't get it to work.
Any advices? Could someone let me know whether I can do this on 0.8 at all?
Thank you.
From: Gal Nitzan [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To:
Olive g wrote:
For example, I want to crawl 20,000 pages everyday for 10 days and
then merge
the data for search. So far, I can't get it to work.
Any advices? Could someone let me know whether I can do this on 0.8 at
all?
Not yet. This functionality hasn't been ported yet from 0.7. It's on
Hey all,
I have writen a custom HTML parser and indexer. I would like to save some
information that I have gathered during the parse in a Mysql DB. I imagine
there could be some performance hit here (e.g. connecting to db). What's
the best place to add code to save this information - the
Thank you for your reply. I have a few more questions:
- Is there any workaround that I can use for now for what I want to do
(multiple crawls
and then combine the data for search).
- If I were to back down to 0.7, would the data from 0.8 crawls be
compatible with 0.7
(I use DFS)?
Olive g wrote:
Thank you for your reply. I have a few more questions:
- Is there any workaround that I can use for now for what I want to
do (multiple crawls
and then combine the data for search).
You can always combine index data into a single Lucene index - but there
is no tool yet
Any thoughts?
--
View this message in context:
http://www.nabble.com/Saving-Metadata-to-Mysql-t1389216.html#a3736241
Sent from the Nutch - User forum at Nabble.com.
Hi Andrzej other gurus who might be reading this message :-):
I ran some tests and somehow my query returned 0 hit against merged indexes.
Here is my test case and it's a bit long, thank you in advance for your
patience:
1. crawled the first 100 urls
~/nutch/search/bin/nutch crawl
It depends if you control the seed pages or not; if you do, you could tag
them index=no
and skip them during indexing. You would have to change HtmlParser and
BasicIndexingFilter.
Rgrds, Thomas
On 4/4/06, Benjamin Higgins [EMAIL PROTECTED] wrote:
Hello,
I've gone through the documentation
13 matches
Mail list logo