[jira] [Issue Comment Edited] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-01 Thread Ferdy Galema (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220097#comment-13220097 ] Ferdy Galema edited comment on NUTCH-902 at 3/2/12 7:21 AM: Not

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220665#comment-13220665 ] Hudson commented on NUTCH-1293: --- Integrated in Nutch-trunk #1774 (See [https://builds.apach

[jira] [Commented] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-03-01 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220666#comment-13220666 ] Hudson commented on NUTCH-1258: --- Integrated in Nutch-trunk #1774 (See [https://builds.apach

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-01 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220662#comment-13220662 ] Hudson commented on NUTCH-902: -- Integrated in Nutch-nutchgora #180 (See [https://builds.apach

[Nutch Wiki] Trivial Update of "NutchTutorial" by LewisJohnMcgibbney

2012-03-01 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "NutchTutorial" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=56&rev2=57 }}} * `mkdir -p urls` + * `cd urls` - *

[jira] [Updated] (NUTCH-1262) Map `duplicating` content-types to a single type

2012-03-01 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1262: - Attachment: NUTCH-1262-1.5-2.patch Updated patch for 1.5 for the latest trunk revision.

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220122#comment-13220122 ] Hudson commented on NUTCH-1293: --- Integrated in nutch-trunk-maven #178 (See [https://builds.

[jira] [Commented] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-03-01 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220123#comment-13220123 ] Hudson commented on NUTCH-1258: --- Integrated in nutch-trunk-maven #178 (See [https://builds.

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

2012-03-01 Thread Dan Rosher (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Rosher updated NUTCH-1294: -- Attachment: NUTCH-1294.patch > IndexClean job with solr implementation. > -

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

2012-03-01 Thread Dan Rosher (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Rosher updated NUTCH-1294: -- Attachment: (was: NUTCH-1294.patch) > IndexClean job with solr implementation. > --

[jira] [Updated] (NUTCH-1294) IndexClean job with solr implementation.

2012-03-01 Thread Dan Rosher (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Rosher updated NUTCH-1294: -- Attachment: NUTCH-1294.patch > IndexClean job with solr implementation. > -

[jira] [Created] (NUTCH-1294) IndexClean job with solr implementation.

2012-03-01 Thread Dan Rosher (Created) (JIRA)
IndexClean job with solr implementation. Key: NUTCH-1294 URL: https://issues.apache.org/jira/browse/NUTCH-1294 Project: Nutch Issue Type: Improvement Affects Versions: nutchgora Report

[jira] [Resolved] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-03-01 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1258. -- Resolution: Fixed Committed for 1.5 in rev 1295624. Thanks Jul. > MoreIndexing

[jira] [Commented] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-03-01 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220101#comment-13220101 ] Julien Nioche commented on NUTCH-1258: -- Weird. Yes, please do fix and commit if you c

[jira] [Resolved] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1293. -- Resolution: Fixed Committed for 1.5 in rev. 1295614. > IndexingFiltersChecker

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2012-03-01 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220097#comment-13220097 ] Ferdy Galema commented on NUTCH-902: Note I changed gora-hbase-mapping.xml slightly: I

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220095#comment-13220095 ] Julien Nioche commented on NUTCH-1293: -- +1 > IndexingFiltersChecker

[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1293: - Attachment: NUTCH-1293-1.5-1.patch Wrong patch indeed :) > IndexingFiltersChecke

[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1293: - Attachment: (was: NUTCH-1293-1.5-1.patch) > IndexingFiltersChecker to store detected cont

[jira] [Commented] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220087#comment-13220087 ] Julien Nioche commented on NUTCH-1293: -- wrong patch? > IndexingFilte

[jira] [Updated] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1293: - Attachment: NUTCH-1293-1.5-1.patch Patch for 1.5. > IndexingFiltersChecker to s

[jira] [Commented] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-03-01 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220080#comment-13220080 ] Markus Jelsma commented on NUTCH-1258: -- The patch won't patch as it complains about b

[jira] [Created] (NUTCH-1293) IndexingFiltersChecker to store detected content type in crawldatum metadata

2012-03-01 Thread Markus Jelsma (Created) (JIRA)
IndexingFiltersChecker to store detected content type in crawldatum metadata Key: NUTCH-1293 URL: https://issues.apache.org/jira/browse/NUTCH-1293 Project: Nutch Is

Re: NUTCH-1273

2012-03-01 Thread Lewis John Mcgibbney
Hi Markus, Well it would appear that the method I mention is the only one which still uses instances of the deprecated API. I notice that we support Tika-core & parsers 0.10 in Nutchgora and 1.0 core in trunk. I'll probably just re-open the relevant issues again assign Nutchgora to them and upgrad