Re: [Nutchbase] WebPage class is a generated code?

2010-07-03 Thread Doğacan Güney
Hey, On Fri, Jul 2, 2010 at 17:26, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, Since they are generated, +1 to: - adding a filepattern to svn:ignore to ignore them - updating build.xml to autogenerate I created NUTCH-842 to track this problem.

[jira] Created: (NUTCH-842) AutoGenerate WebPage code

2010-07-03 Thread JIRA
AutoGenerate WebPage code - Key: NUTCH-842 URL: https://issues.apache.org/jira/browse/NUTCH-842 Project: Nutch Issue Type: Improvement Reporter: Doğacan Güney Assignee: Doğacan Güney

Minimizing the number of stored fields for Solr

2010-07-03 Thread Doğacan Güney
Hey everyone, This is not really a proposition but rather something I have been wondering for a while so I wanted to see what everyone is thinking. Currently in our solr backend, we have stored=true indexed=false fields and stored=true indexed=true fields. The former class of fields are mostly

Nutchbase design doc

2010-07-03 Thread Doğacan Güney
Hello everyone, I am attaching first draft of a complete nutchbase design document. There are parts missing and parts not yet explained clearly but I would like to get everyone's opinion on what they think so far. Please let me know which parts are unclear, which parts make no sense etc, and I

Re: Minimizing the number of stored fields for Solr

2010-07-03 Thread Andrzej Bialecki
On 2010-07-03 10:00, Doğacan Güney wrote: Hey everyone, This is not really a proposition but rather something I have been wondering for a while so I wanted to see what everyone is thinking. Currently in our solr backend, we have stored=true indexed=false fields and stored=true indexed=true

Re: Nutchbase design doc

2010-07-03 Thread Doğacan Güney
Hi Alex, On Sat, Jul 3, 2010 at 14:45, Alex McLintock alex.mclint...@gmail.comwrote: Doğacan 2010/7/3 Doğacan Güney doga...@gmail.com: I am attaching first draft of a complete nutchbase design document. There are parts missing and parts not yet explained clearly but I would like to get

Re: Nutchbase design doc

2010-07-03 Thread Mattmann, Chris A (388J)
Guys, This sounds awesome. Even I could understand it, which is saying something! :) My only question: why introduce a new data structure called “Markers” when all that seems to be is a Metadata object. Let’s use o.a.tika.metadata.Metadata to represent that? My only comment then would be,

Re: Nutchbase design doc

2010-07-03 Thread Doğacan Güney
Hi Chris, On Sat, Jul 3, 2010 at 18:35, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Guys, This sounds awesome. Even I could understand it, which is saying something! :) My only question: why introduce a new data structure called “Markers” when all that seems to be is a

[jira] Updated: (NUTCH-838) Add timing information to all Tool classes

2010-07-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-838: Fix Version/s: 1.2 I'll backport this to the 1.2 branch as well. Add timing information

[jira] Resolved: (NUTCH-838) Add timing information to all Tool classes

2010-07-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-838. - Resolution: Fixed - Patch applied to trunk in r960246 and backported to 1.2-branch in

YCSB benchmark for KV stores

2010-07-03 Thread Andrzej Bialecki
Hi, Found this link: http://wiki.github.com/brianfrankcooper/YCSB/papers-and-presentations Would be cool to run the benchmark for the same stores but via Gora. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information

Hudson build is back to normal : Nutch-trunk #1197

2010-07-03 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1197/changes

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884996#action_12884996 ] Hudson commented on NUTCH-837: -- Integrated in Nutch-trunk #1197 (See

[jira] Commented: (NUTCH-838) Add timing information to all Tool classes

2010-07-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884997#action_12884997 ] Hudson commented on NUTCH-838: -- Integrated in Nutch-trunk #1197 (See