RE: Tomcat Problem

2006-04-03 Thread Paul Stewart
When I check over the system with the admin/manager functions of Tomcat everything seems fine (there was definitely something wrong previously) but I still have an issue trying to run the nutch.war file... So checking around logs etc. to find a hint and not sure where to take this now: 3-Apr-06

clean up of hadoop files

2006-04-03 Thread Raghavendra Prabhu
Hi I have been raising this point for quite a time Right now when we have a new job, we store the job.jar and job.xml files in the job tracker. The task tracker if i am right uses this job.jar and job.xml files Should'nt we clean up after the job has been complete( that is purge these files).

RE: Merging indexes -- please help....

2006-04-03 Thread Dan Morrill
Hi, I noticed that when I used the drive designation that it didn't like that (windows cygwin environment) if you did ./nutch merge -local /STG1/index /STG1/indexes that may work better, let me know. Cheers/r/dan -Original Message- From: Vertical Search [mailto:[EMAIL PROTECTED]

help please! - issues with merging indexes w/ DFS on 0.8

2006-04-03 Thread Olive g
Hi gurus, I ran into similar issue as http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04073.html. I just could not get index merging to work. I've browsed the mailing archives the tried many things and nothing worked so far. Does 0.8 support merging indexes? I really appreciate

RE: help please! - issues with merging indexes w/ DFS on 0.8

2006-04-03 Thread Gal Nitzan
Hi, I'm not sure what you are doing so I will just describe hat I'm doing and maybe you would find the answer :) Let's make some assumptions: 1. main nutch dfs is: /user/nutchuser 1.1 you should have /user/nutchuser/crawldb 1.2 you should have /user/nutchuser/segments with some fetched segments

thanks, but what I wanted to do is to merge segments from multiple crawls

2006-04-03 Thread Olive g
For example, I want to crawl 20,000 pages everyday for 10 days and then merge the data for search. So far, I can't get it to work. Any advices? Could someone let me know whether I can do this on 0.8 at all? Thank you. From: Gal Nitzan [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To:

Re: thanks, but what I wanted to do is to merge segments from multiple crawls

2006-04-03 Thread Andrzej Bialecki
Olive g wrote: For example, I want to crawl 20,000 pages everyday for 10 days and then merge the data for search. So far, I can't get it to work. Any advices? Could someone let me know whether I can do this on 0.8 at all? Not yet. This functionality hasn't been ported yet from 0.7. It's on

Saving Metadata to Mysql

2006-04-03 Thread mikeyc
Hey all, I have writen a custom HTML parser and indexer. I would like to save some information that I have gathered during the parse in a Mysql DB. I imagine there could be some performance hit here (e.g. connecting to db). What's the best place to add code to save this information - the

more questions on this - please advice

2006-04-03 Thread Olive g
Thank you for your reply. I have a few more questions: - Is there any workaround that I can use for now for what I want to do (multiple crawls and then combine the data for search). - If I were to back down to 0.7, would the data from 0.8 crawls be compatible with 0.7 (I use DFS)?

Re: more questions on this - please advice

2006-04-03 Thread Andrzej Bialecki
Olive g wrote: Thank you for your reply. I have a few more questions: - Is there any workaround that I can use for now for what I want to do (multiple crawls and then combine the data for search). You can always combine index data into a single Lucene index - but there is no tool yet

Re: Saving Metadata to Mysql

2006-04-03 Thread mikeyc
Any thoughts? -- View this message in context: http://www.nabble.com/Saving-Metadata-to-Mysql-t1389216.html#a3736241 Sent from the Nutch - User forum at Nabble.com.

Query on merged indexes returned 0 hit - test case included (Nutch 0.8)

2006-04-03 Thread Olive g
Hi Andrzej other gurus who might be reading this message :-): I ran some tests and somehow my query returned 0 hit against merged indexes. Here is my test case and it's a bit long, thank you in advance for your patience: 1. crawled the first 100 urls ~/nutch/search/bin/nutch crawl

Re: Crawling a file but not indexing it

2006-04-03 Thread TDLN
It depends if you control the seed pages or not; if you do, you could tag them index=no and skip them during indexing. You would have to change HtmlParser and BasicIndexingFilter. Rgrds, Thomas On 4/4/06, Benjamin Higgins [EMAIL PROTECTED] wrote: Hello, I've gone through the documentation