Andrzej Bialecki wrote:
Or to use an implementation of ObjectWritable, which contains all needed
partial data?
Yes, but ObjectWritable is considerably bigger, and hence slower to
copy, sort, etc., since it writes the class name with every instance.
LongWritable is a good way to write 64-bit q
Doug Cutting wrote:
[EMAIL PROTECTED] wrote:
Implement a reader for CrawlDB, loosely inspired by NUTCH-114 (thanks
Stefan!).
The reader offers similar functionality to the classic "readdb" command.
This looks great! Thanks, Andrzej.
No problem - I don't know about you, but I felt like
Doug Cutting wrote:
I just ran it on a 50M page crawl.
FYI, here's the output:
051123 191703 TOTAL urls: 167780785
051123 191703 avg score:1.152
051123 191703 max score:47357.137
051123 191703 min score:1.0
051123 191703 retry 0: 167780785
051123 191703 status 1
[EMAIL PROTECTED] wrote:
Implement a reader for CrawlDB, loosely inspired by NUTCH-114 (thanks Stefan!).
The reader offers similar functionality to the classic "readdb" command.
This looks great! Thanks, Andrzej.
I just ran it on a 50M page crawl. It took longer than I expected. The
reduce
Sami Siren wrote:
+ if (k.contains("score")) {
Since:
1.5
Ah, indeed. Fixed - thanks!
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Uni
+ if (k.contains("score")) {
Since:
1.5
--
Sami Siren