[jira] [Commented] (NUTCH-1005) Index headings plugin

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202219#comment-13202219 ] Markus Jelsma commented on NUTCH-1005: -- i'll commit this one shortly if there are no

[jira] [Commented] (NUTCH-1266) Subcollection to optionally write to configured fields

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202220#comment-13202220 ] Markus Jelsma commented on NUTCH-1266: -- comments? > Subcollection t

[jira] [Commented] (NUTCH-1210) DomainBlacklistFilter

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202221#comment-13202221 ] Markus Jelsma commented on NUTCH-1210: -- I'll send this one in if there are no objecti

[jira] [Updated] (NUTCH-1005) Parse headings plugin

2012-02-07 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1005: - Summary: Parse headings plugin (was: Index headings plugin) > Parse headings plugin > --

[jira] [Resolved] (NUTCH-1005) Parse headings plugin

2012-02-07 Thread Markus Jelsma (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1005. -- Resolution: Fixed Committed for 1.5 in rev. 1241460. > Parse headings plugin >

[jira] [Commented] (NUTCH-1266) Subcollection to optionally write to configured fields

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202378#comment-13202378 ] Markus Jelsma commented on NUTCH-1266: -- I'll commit this one in a few hours unless th

[jira] [Commented] (NUTCH-1005) Parse headings plugin

2012-02-07 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202403#comment-13202403 ] Hudson commented on NUTCH-1005: --- Integrated in nutch-trunk-maven #139 (See [https://builds.

[jira] [Commented] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202460#comment-13202460 ] Markus Jelsma commented on NUTCH-1259: -- I'll comment on it myself then: the code abov

[jira] [Updated] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata

2012-02-07 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1259: - Attachment: NUTCH-1259-1.5-1.patch Here's a patch for 1.5. Comments? We have this running in prod

[jira] [Updated] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-02-07 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1258: - Priority: Minor (was: Major) > MoreIndexingFilter should be able to read Content-Type from b

[jira] [Commented] (NUTCH-1258) MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202470#comment-13202470 ] Markus Jelsma commented on NUTCH-1258: -- With the patch of NUTCH-1258 this is no longe

[jira] [Commented] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata

2012-02-07 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202474#comment-13202474 ] Julien Nioche commented on NUTCH-1259: -- bq. I'll commit this one tomorrow unless ther

[jira] [Commented] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata

2012-02-07 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202483#comment-13202483 ] Markus Jelsma commented on NUTCH-1259: -- you're right. but since you're most of the ti

[jira] [Commented] (NUTCH-1259) TikaParser should not add Content-Type from HTTP Headers to Nutch Metadata

2012-02-07 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202603#comment-13202603 ] Lewis John McGibbney commented on NUTCH-1259: - Hey Markus. I'm literally up to

[jira] [Commented] (NUTCH-1005) Parse headings plugin

2012-02-07 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203268#comment-13203268 ] Hudson commented on NUTCH-1005: --- Integrated in Nutch-trunk #1752 (See [https://builds.apach