[jira] Updated: (NUTCH-122) block numbers need a better random number generator

2005-10-21 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-122?page=all ] Paul Baclace updated NUTCH-122: --- Attachment: MersenneTwister.java I am attaching MersenneTwister.java The license on the attached source: All implementations are based on the mt19937 C code

[jira] Updated: (NUTCH-122) block numbers need a better random number generator

2005-10-21 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-122?page=all ] Paul Baclace updated NUTCH-122: --- Attachment: MersenneTwister.java Resubmitting MersenneTwister.java, this time with the Grant ASF checked. block numbers need a better random number generator

[jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS

2005-10-21 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ] Paul Baclace updated NUTCH-116: --- Attachment: TestNDFS.java Revised TestNDFS to add a log message about which random number generator is in use (also changed the fixed seed to a newly created

[jira] Commented: (NUTCH-117) Crawl crashes with java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL

2005-10-21 Thread Nick Jacobsen (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-117?page=comments#action_12332720 ] Nick Jacobsen commented on NUTCH-117: - I had a similar issue, and it seems (guessing here) to be related to some sort of race condition on filehandles. I was running the

Re: OPIC

2005-10-21 Thread Andrzej Bialecki
Massimo Miccoli wrote: Sorry Andrzej, I mean on DeleteDuplicates.java, not in runtime. Is that the correct place to integrate some like Shingling or n-gram? Yes. But there is an small issue of high dimensionality to solve, otherwise it will be very inefficient... Both shingling and n-gram