RE: project vitality?

2006-03-04 Thread Richard Braman
don't expect polish. You shouldn't need polish to be able to leran the command required to resume an aborted drawl, or to index what you have already crawled. Things like this shouldn't require an easter egg hunt. They are going to heppen to evryone doing greater than a simple crawl. If you

Re: project vitality?

2006-03-04 Thread Stefan Groschupf
Hi Richard, I told you I was more than willing to help, and I think many users feel the same way, but I for one feel that there is a lack of documentation and support. This isn't meant to offend anyone, if you are offended you need to toughen up your skin a little bit. Here you can find

RE: project vitality?

2006-03-04 Thread Howie Wang
I agree that the doc could be better, but I still take issue with the earlier use of the phrase proof-of-concept. If there are dozens of sites using it in production, several of them indexing 100's of millions of pages, I don't know how you can call it proof-of-concept. Honestly, I'm not sure if

RE: project vitality?

2006-03-04 Thread Richard Braman
I do thank nutch developers very, very much for what they have put into the project:) I think the concept is great and yes it does work, if you invest the time needed to learn the interfaces, updgrade the distribution nightly, relearn the commands, etc. Doug's statement that nutch is for early

RE: how can i go deep?

2006-03-04 Thread Richard Braman
Try using depth=n when you do the crawl. Post crawl I don't know, but I have the same question. How do you make the index go deeper when you do your next roudn of fetching is still something I haven't figured out. -Original Message- From: Peter Swoboda [mailto:[EMAIL PROTECTED] Sent:

Re: how can i go deep?

2006-03-04 Thread Stefan Groschupf
The crawl command creates a crawlDB for each call. So as Rchard mentioned try a higher depth. In case you like nutch to go deeper with each iteration, try the whole web tutorial but change the url filter in a manner that it only crawls your webpage. This will go as deep as much iteration

Moving tutorial link to wiki

2006-03-04 Thread Richard Braman
Maybe we should move the tutorial to the wiki so it can be commented on. Richard Braman mailto:[EMAIL PROTECTED] 561.748.4002 (voice) http://www.taxcodesoftware.org http://www.taxcodesoftware.org/ Free Open Source Tax Software

Re: Moving tutorial link to wiki

2006-03-04 Thread Matthias Jaekle
Maybe we should move the tutorial to the wiki so it can be commented on. +1

RE: project vitality?

2006-03-04 Thread Richard Braman
The nutch dev team isn't focused on PDF parsing. Nutch is a search engine framework, IMHO, if you don't parse something correctly, you cannnot rely on the results. We have all parsed things where you leave a comma out and the parse results are wrong. If there was a bug in nutches html parsing

Re: project vitality?

2006-03-04 Thread Matthias Jaekle
I am sorry if you don't like my opinion or the way it is expressed. Hi Richard, most of your opinion I think is the same as mine. I use nutch now since spring 2004 for our page http://www.umkreisfinder.de It was a big effort to learn how nutch is working and also a big effort to learn how

RE: project vitality?

2006-03-04 Thread Richard Braman
I realy do think nutch is great, but I echo Matthias's comments that the community needs to come together and contirbute more back. And that comes with the requirement of making sure volunteers are given access to make their contributions part of the project. Also, if you use nutch you should

Re: project vitality?

2006-03-04 Thread Stefan Groschupf
Maybe we should organize us ourself a little bit better in this point. What do you think? Just a general note, jira has a voting functionality. This allows everybody to vote an issue and can show in a very compressed style what the community is looking for. However it is not used that often

Re: project vitality?

2006-03-04 Thread Chris Mattmann
Hi Richard, IMHO, if you don't parse something correctly, you cannnot rely on the results. Good, we're on the same page here. We have all parsed things where you leave a comma out and the parse results are wrong. If there was a bug in nutches html parsing would that be a big deal? Yes,

RE: how can i go deep?

2006-03-04 Thread Richard Braman
Stefan, I think I know what you're saying. When you are new to nutch and you read the tutorial, It kind of leads you to believe (incorrectly) that whole web crawling is different from intranet crawling and that the steps are somehow different and independent of one another. In fact it looks

RE: url shown instead of title.

2006-03-04 Thread Richard Braman
whoops i hit send by accident :( any idea why http://24.75.221.234:8080/search.jsp?query=e-file+site%3Awww.irs.gov http://24.75.221.234:8080/search.jsp?query=e-file+site%3Awww.irs.govhi tsPerPage=10hitsPerSite=0clustering hitsPerPage=10hitsPerSite=0clustering= returns a list of hits where the

Ubsubscribe

2006-03-04 Thread vaibhav . verma
- This email was sent using SquirrelMail. Webmail for nuts! http://squirrelmail.org/ -- This message has been scanned for viruses and dangerous content and is believed to be clean.