[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878441#comment-13878441 ] Markus Jelsma commented on NUTCH-1113: -- The last chronological index went wrong, some

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Attachment: NUTCH-1113-junit.patch Attached patch seems to completely fix the issue, finally! *

Re: Proposal for SolrIndexWriter

2014-01-22 Thread Lajos
Hi Markus, Sorry for the delay, I've been swamped. Sure, adding additional Nutch fields to my Solr schema isn't a big deal. But its not always so simple. For one thing, I just happen to need to map a Nutch field to 3 Solr fields (ok, the use case could be changed, but it illustrates the

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878460#comment-13878460 ] Markus Jelsma commented on NUTCH-1325: -- Hi Tejas - i am fine with the changes you

[jira] [Resolved] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1325. Resolution: Fixed Fix Version/s: (was: 1.9) 1.8 Thanks [~markus17]

Re: Renovating Nutch Hadoop Tutorial wiki page

2014-01-22 Thread Tejas Patil
Thanks *Julien* for pointing me to new NutchHadoopSingleNodeTutorial wiki page [0]. I would soon remove the old nutchhadooptutorial page from wiki. [0] : http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial *@d_k*, there are already tutorials for running Nutch 2.x. See [1] and [2]. Those

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878562#comment-13878562 ] Hudson commented on NUTCH-1325: --- SUCCESS: Integrated in Nutch-trunk #2501 (See

Re: Renovating Nutch Hadoop Tutorial wiki page

2014-01-22 Thread Julien Nioche
Thanks Tejas! On 22 January 2014 11:51, Tejas Patil tejas.patil...@gmail.com wrote: Moved the old nutchhadooptutorial page from Nutch wiki Front page to Archive and Legacy. ~tejas On Wed, Jan 22, 2014 at 5:09 PM, Tejas Patil tejas.patil...@gmail.comwrote: Thanks *Julien* for pointing

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878568#comment-13878568 ] Markus Jelsma commented on NUTCH-1325: -- Thanks a lot Tejas for spending your time on

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878570#comment-13878570 ] Lewis John McGibbney commented on NUTCH-1325: - yeah Tejas this is a belter

[jira] [Updated] (NUTCH-1164) Write JUnit tests for protocol-http

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1164: --- Attachment: TEST-org.apache.nutch.protocol.http.TestProtocolHttp.txt Hi [~Sertac Turkel], I tried

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878578#comment-13878578 ] Markus Jelsma commented on NUTCH-1325: -- conf/log4j.properties has two dots in the

[jira] [Commented] (NUTCH-1325) HostDB for Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878623#comment-13878623 ] Tejas Patil commented on NUTCH-1325: Hi [~markus17], Thanks for the correction. This

[jira] [Updated] (NUTCH-1465) Support sitemaps in Nutch

2014-01-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1465: --- Attachment: NUTCH-1465-trunk.v3.patch Now that HostDb (NUTCH-1365) is in trunk, updated the patch

Right was to run crawl script in deploy mode

2014-01-22 Thread Tejas Patil
Hi nutch-dev, I was assuming that the commands to run the bin/crawl script in both local and deploy mode are the same. ie. from $NUTCH_HOME/runtime/local (or runtime/deploy), use bin/crawl seedDir crawlDir solrURL numberOfRounds It turns out that in deploy mode, this does not obtain the

[jira] [Created] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc

2014-01-22 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1709: --- Summary: Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc Key: NUTCH-1709 URL:

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879124#comment-13879124 ] Lewis John McGibbney commented on NUTCH-1253: - Any objections to commit?

[jira] [Resolved] (NUTCH-1413) Record response time

2014-01-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1413. Resolution: Fixed Fix Version/s: (was: 1.9) 1.8 Committed to

Nutch 2.x HEAD + gora-core gora-cassandra 0.4-SNAPSHOT (trunk)

2014-01-22 Thread Lewis John Mcgibbney
Hi Folks, Sorry for cross posting... Simple question. Is anyone using the above combination? When I try and fetch a batchId e.g. ./bin/fetch 1390426083-1144459470, sometimes I am unable to fetch pages and my logging indicates 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s,

[jira] [Commented] (NUTCH-1413) Record response time

2014-01-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879229#comment-13879229 ] Hudson commented on NUTCH-1413: --- SUCCESS: Integrated in Nutch-trunk #2504 (See

[jira] [Created] (NUTCH-1710) Add gora package logging to log4j.properties

2014-01-22 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1710: --- Summary: Add gora package logging to log4j.properties Key: NUTCH-1710 URL: https://issues.apache.org/jira/browse/NUTCH-1710 Project: Nutch

[jira] [Updated] (NUTCH-1710) Add gora package logging to log4j.properties

2014-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1710: Attachment: NUTCH-1710.patch Patch for 2.x HEAD Add gora package logging to

[jira] [Assigned] (NUTCH-1710) Add gora package logging to log4j.properties

2014-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1710: --- Assignee: Lewis John McGibbney Add gora package logging to log4j.properties

[jira] [Updated] (NUTCH-1710) Add gora package logging to log4j.properties

2014-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1710: Patch Info: Patch Available Add gora package logging to log4j.properties

[jira] [Resolved] (NUTCH-1710) Add gora package logging to log4j.properties

2014-01-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1710. - Resolution: Fixed Committed @revision 1560547 in 2.x HEAD Add gora package

[jira] [Commented] (NUTCH-1710) Add gora package logging to log4j.properties

2014-01-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879291#comment-13879291 ] Hudson commented on NUTCH-1710: --- SUCCESS: Integrated in Nutch-nutchgora #897 (See