Hi Andrzej,
This was a very interesting experiment -- thanks for sharing the results
with us.
The last range was the maximum in this case - Google wouldn't display
any hit above 652 (which I find curious, too - because the total number
of hits is, well, significantly higher - and Google
Dawid Weiss wrote:
Hi Andrzej,
This was a very interesting experiment -- thanks for sharing the
results with us.
The last range was the maximum in this case - Google wouldn't display
any hit above 652 (which I find curious, too - because the total
number of hits is, well, significantly
Hi,
In the function pageRetry() in org.apache.nutch.tools.UpdateDatabaseTool,
a failed page is delayed 1 day to be recrawled. Is it better if we make this
time customizable? say setting it as configurable parameter?
Regards,
Giang
Doug Cutting wrote:
The IndexOptimizer.java class in the searcher package was an old
attempt to create something like what Suel calls fancy postings. It
creates an index with the top 10% scoring postings. Since documents
are not renumbered one can intermix postings from this with the full
Hi Fredrik
Thanks for your reply:)
It is true that you can recommed the top-n most popular queries on
each indexed field. See the example:
http://www.business.com/index.asp?p=true (please select the Job tab).
However, I think betherebesquare.com is a bit different. I mean if my
goal is to
Re!
Well, you have two choices. Either you store ONE index with (what, when,
where, frequency) or THREE indices with (what, frequency), (when, frequency)
and (where, frequency). If you choose the first approach you can just parse
down the (what, when, where) strings in to one string, and by that
Andrzej Bialecki wrote:
By all means please start, this is still near the limits of my knowledge
of Lucene... ;-)
Okay, I'll try to get something working fairly soon.
Doug
mapreduce segment generator generates 50 % less than excepted urls
Key: NUTCH-136
URL: http://issues.apache.org/jira/browse/NUTCH-136
Project: Nutch
Type: Bug
Versions: 0.8-dev
Andrzej Bialecki wrote:
By all means please start, this is still near the limits of my knowledge
of Lucene... ;-)
Attached is a class which sorts a Nutch index by boost. I have only
tested it on a ~100 page index, where it appears to work correctly.
Please tell me how it works for you.
Doug Cutting wrote:
Andrzej Bialecki wrote:
By all means please start, this is still near the limits of my
knowledge of Lucene... ;-)
Attached is a class which sorts a Nutch index by boost. I have only
tested it on a ~100 page index, where it appears to work correctly.
Please tell me
10 matches
Mail list logo