Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Doug Cutting
Andrzej Bialecki wrote: Or to use an implementation of ObjectWritable, which contains all needed partial data? Yes, but ObjectWritable is considerably bigger, and hence slower to copy, sort, etc., since it writes the class name with every instance. LongWritable is a good way to write 64-bit q

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Andrzej Bialecki
Doug Cutting wrote: [EMAIL PROTECTED] wrote: Implement a reader for CrawlDB, loosely inspired by NUTCH-114 (thanks Stefan!). The reader offers similar functionality to the classic "readdb" command. This looks great! Thanks, Andrzej. No problem - I don't know about you, but I felt like

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Doug Cutting
Doug Cutting wrote: I just ran it on a 50M page crawl. FYI, here's the output: 051123 191703 TOTAL urls: 167780785 051123 191703 avg score:1.152 051123 191703 max score:47357.137 051123 191703 min score:1.0 051123 191703 retry 0: 167780785 051123 191703 status 1

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Implement a reader for CrawlDB, loosely inspired by NUTCH-114 (thanks Stefan!). The reader offers similar functionality to the classic "readdb" command. This looks great! Thanks, Andrzej. I just ran it on a 50M page crawl. It took longer than I expected. The reduce

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Andrzej Bialecki
Sami Siren wrote: + if (k.contains("score")) { Since: 1.5 Ah, indeed. Fixed - thanks! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Uni

Re: svn commit: r348431 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl: CrawlDatum.java CrawlDbReader.java

2005-11-23 Thread Sami Siren
+ if (k.contains("score")) { Since: 1.5 -- Sami Siren