Re: crawling sites which require authentication

2006-10-14 Thread Tomi NA
2006/10/14, Toufeeq Hussain [EMAIL PROTECTED]: From internal tests with ntlmaps + Nutch the conclusion we came to was that though it kinda-works it puts a huge load on the Nutch server as ntlmaps is a major memory-hog and the mixture of the two leads to performance issues. For a PoC this will

Re: crawling sites which require authentication

2006-10-14 Thread Jim Wilson
Yeah seriously - if NTLM auth (or HTTP Basic for that matter) is supported natively by Nutch, I'd love to read the documentation on it! -- Jim On 10/14/06, Tomi NA [EMAIL PROTECTED] wrote: 2006/10/14, Toufeeq Hussain [EMAIL PROTECTED]: From internal tests with ntlmaps + Nutch the conclusion

Re: I can not query myplugin in field category:test

2006-10-14 Thread Ernesto De Santis
Hi Chad The link was a configuration example. more explained example: http://www.misite.com/videos/.*=videos (rule A) if the url fetched match which rule A, then index a Field named = 'category' with value = 'videos'. Later you can search over this field category to filter yours searches.

how to share code?

2006-10-14 Thread Ernesto De Santis
Hi I want to share my plugin 'index-url-category' I sent it attached to the list but it was rejected. Ernesto. __ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo!

Re: how to share code?

2006-10-14 Thread Andrzej Bialecki
Ernesto De Santis wrote: Hi I want to share my plugin 'index-url-category' I sent it attached to the list but it was rejected. Please submit this as a JIRA issue and attach it there. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ ||

Re: how to share code?

2006-10-14 Thread chad savage
hmmm. maybe a patch file. I've seen people post code in the messages Chad Savage Ernesto De Santis wrote: Hi I want to share my plugin 'index-url-category' I sent it attached to the list but it was rejected. Ernesto. __

Re: how to share code?

2006-10-14 Thread Andrzej Bialecki
chad savage wrote: hmmm. maybe a patch file. I've seen people post code in the messages Nutch mailing lists are configured to reject attachments (for security reasons). For a couple lines fix it's ok to include it in a message, although then it's too easy for developers to miss and/or

Re: Dedup undeletes previously deleted documents

2006-10-14 Thread Andrzej Bialecki
[EMAIL PROTECTED] wrote: Hi all, I've noticed that in 0.8.1 org.apache.nutch.indexer.DeleteDuplicates undeletes previously deleted documents in my index. Here's how you can [..] Is this intentional? Good question ... ;) I don't see any reason to do this in the current code