Re: Google performance bottlenecks ;-) (Re: Lucene performance bottlenecks)

2005-12-12 Thread Dawid Weiss
Hi Andrzej, This was a very interesting experiment -- thanks for sharing the results with us. The last range was the maximum in this case - Google wouldn't display any hit above 652 (which I find curious, too - because the total number of hits is, well, significantly higher - and Google

Re: Google performance bottlenecks ;-) (Re: Lucene performance bottlenecks)

2005-12-12 Thread Andrzej Bialecki
Dawid Weiss wrote: Hi Andrzej, This was a very interesting experiment -- thanks for sharing the results with us. The last range was the maximum in this case - Google wouldn't display any hit above 652 (which I find curious, too - because the total number of hits is, well, significantly

Customize the time to retry

2005-12-12 Thread Nguyen Ngoc Giang
Hi, In the function pageRetry() in org.apache.nutch.tools.UpdateDatabaseTool, a failed page is delayed 1 day to be recrawled. Is it better if we make this time customizable? say setting it as configurable parameter? Regards, Giang

IndexOptimizer (Re: Lucene performance bottlenecks)

2005-12-12 Thread Andrzej Bialecki
Doug Cutting wrote: The IndexOptimizer.java class in the searcher package was an old attempt to create something like what Suel calls fancy postings. It creates an index with the top 10% scoring postings. Since documents are not renumbered one can intermix postings from this with the full

Re: Hot Search! Re: Nutch Suggestion? (Google like did you mean)

2005-12-12 Thread Jack Tang
Hi Fredrik Thanks for your reply:) It is true that you can recommed the top-n most popular queries on each indexed field. See the example: http://www.business.com/index.asp?p=true (please select the Job tab). However, I think betherebesquare.com is a bit different. I mean if my goal is to

Re: Hot Search! Re: Nutch Suggestion? (Google like did you mean)

2005-12-12 Thread Fredrik Andersson
Re! Well, you have two choices. Either you store ONE index with (what, when, where, frequency) or THREE indices with (what, frequency), (when, frequency) and (where, frequency). If you choose the first approach you can just parse down the (what, when, where) strings in to one string, and by that

Re: IndexOptimizer (Re: Lucene performance bottlenecks)

2005-12-12 Thread Doug Cutting
Andrzej Bialecki wrote: By all means please start, this is still near the limits of my knowledge of Lucene... ;-) Okay, I'll try to get something working fairly soon. Doug

[jira] Created: (NUTCH-136) mapreduce segment generator generates 50 % less than excepted urls

2005-12-12 Thread Stefan Groschupf (JIRA)
mapreduce segment generator generates 50 % less than excepted urls Key: NUTCH-136 URL: http://issues.apache.org/jira/browse/NUTCH-136 Project: Nutch Type: Bug Versions: 0.8-dev

Re: IndexOptimizer (Re: Lucene performance bottlenecks)

2005-12-12 Thread Doug Cutting
Andrzej Bialecki wrote: By all means please start, this is still near the limits of my knowledge of Lucene... ;-) Attached is a class which sorts a Nutch index by boost. I have only tested it on a ~100 page index, where it appears to work correctly. Please tell me how it works for you.

Re: IndexOptimizer (Re: Lucene performance bottlenecks)

2005-12-12 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: By all means please start, this is still near the limits of my knowledge of Lucene... ;-) Attached is a class which sorts a Nutch index by boost. I have only tested it on a ~100 page index, where it appears to work correctly. Please tell me