[jira] [Created] (NUTCH-1702) Port HostNormalizer to 2.x

2014-01-15 Thread Tien Nguyen Manh (JIRA)
Tien Nguyen Manh created NUTCH-1702: --- Summary: Port HostNormalizer to 2.x Key: NUTCH-1702 URL: https://issues.apache.org/jira/browse/NUTCH-1702 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1702) Port HostNormalizer to 2.x

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1702: Attachment: NUTCH-1702.patch Port HostNormalizer to 2.x --

[jira] [Updated] (NUTCH-1702) Port HostNormalizer to 2.x

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1702: Fix Version/s: 2.3 Port HostNormalizer to 2.x --

[jira] [Updated] (NUTCH-1702) Port HostNormalizer to 2.x

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1702: Attachment: NUTCH-1702.patch Port HostNormalizer to 2.x --

[jira] [Updated] (NUTCH-1702) Port HostNormalizer to 2.x

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1702: Attachment: (was: NUTCH-1702.patch) Port HostNormalizer to 2.x

[jira] [Created] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Canan Girgin (JIRA)
Canan Girgin created NUTCH-1703: --- Summary: Nutch ignores alt text of images Key: NUTCH-1703 URL: https://issues.apache.org/jira/browse/NUTCH-1703 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Canan Girgin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Canan Girgin updated NUTCH-1703: Attachment: NUTCH_1703.patch Nutch ignores alt text of images

[jira] [Updated] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1703: - Fix Version/s: 1.8 Nutch ignores alt text of images

[jira] [Commented] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871904#comment-13871904 ] Markus Jelsma commented on NUTCH-1703: -- Can you provide a test for

[jira] [Resolved] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1568. - Resolution: Fixed Committed @revision 1558349 in 2.x [~talat], thank you for

[jira] [Updated] (NUTCH-1655) Indexer Plugin for Elastic Search

2014-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1655: Attachment: NUTCH-1655-v3.patch Updated patch to correct formatting in confi

[jira] [Commented] (NUTCH-1655) Indexer Plugin for Elastic Search

2014-01-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871998#comment-13871998 ] Markus Jelsma commented on NUTCH-1655: -- Hi i haven't read the code but incorporating

[jira] [Commented] (NUTCH-1655) Indexer Plugin for Elastic Search

2014-01-15 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872009#comment-13872009 ] Talat UYARER commented on NUTCH-1655: - Hi [~markus17], I have already included

[jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872010#comment-13872010 ] Hudson commented on NUTCH-1568: --- SUCCESS: Integrated in Nutch-nutchgora #887 (See

[jira] [Commented] (NUTCH-1655) Indexer Plugin for Elastic Search

2014-01-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872015#comment-13872015 ] Markus Jelsma commented on NUTCH-1655: -- Nice :) Indexer Plugin for Elastic Search

[jira] [Comment Edited] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Canan Girgin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872106#comment-13872106 ] Canan Girgin edited comment on NUTCH-1703 at 1/15/14 2:18 PM: --

[jira] [Commented] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Canan Girgin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872106#comment-13872106 ] Canan Girgin commented on NUTCH-1703: - ok. A new patch Patch had been added which

[jira] [Updated] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Canan Girgin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Canan Girgin updated NUTCH-1703: Attachment: NUTCH_1703_v2.patch Nutch ignores alt text of images

[jira] [Commented] (NUTCH-1703) Nutch ignores alt text of images

2014-01-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872116#comment-13872116 ] Markus Jelsma commented on NUTCH-1703: -- How is this patch made? I cannot patch the

[jira] [Commented] (NUTCH-1701) Make Solr Document Boost as an option

2014-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872137#comment-13872137 ] Lewis John McGibbney commented on NUTCH-1701: - Configurable sounds good. I'm

[jira] [Updated] (NUTCH-1704) Port DomainBlacklist urlfilter to 2.x

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1704: Attachment: NUTCH-1704.patch Port DomainBlacklist urlfilter to 2.x

[jira] [Updated] (NUTCH-1699) Tika Parser - Image Parse Bug

2014-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1699: Attachment: NUTCH-1699v2-2.x.patch Patch for 2.x Tika Parser - Image Parse Bug

[jira] [Updated] (NUTCH-1662) Indexer Plugin for Solr Cloud

2014-01-15 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasin Kılınç updated NUTCH-1662: Attachment: NUTCH-1662.patch I create indexer plugin of SolrCloud. This patch can apply after

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1478: Attachment: NUTCH-1478-parse-v2.patch i port parse-metatags to 2.x, this patch support

[jira] [Commented] (NUTCH-1699) Tika Parser - Image Parse Bug

2014-01-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872198#comment-13872198 ] Hudson commented on NUTCH-1699: --- SUCCESS: Integrated in Nutch-nutchgora #888 (See

[jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index

2014-01-15 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872199#comment-13872199 ] Alparslan Avcı commented on NUTCH-1674: --- Hi [~memnoh], the patch is prepared for 2.x

[jira] [Commented] (NUTCH-1699) Tika Parser - Image Parse Bug

2014-01-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872205#comment-13872205 ] Hudson commented on NUTCH-1699: --- SUCCESS: Integrated in Nutch-trunk #2491 (See

[jira] [Updated] (NUTCH-1705) Make configuration option for HtmlParser TikaParser to extract text or title for noIndex page

2014-01-15 Thread Tien Nguyen Manh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tien Nguyen Manh updated NUTCH-1705: Attachment: NUTCH-1705.patch Make configuration option for HtmlParser TikaParser to

[jira] [Created] (NUTCH-1705) Make configuration option for HtmlParser TikaParser to extract text or title for noIndex page

2014-01-15 Thread Tien Nguyen Manh (JIRA)
Tien Nguyen Manh created NUTCH-1705: --- Summary: Make configuration option for HtmlParser TikaParser to extract text or title for noIndex page Key: NUTCH-1705 URL: