[jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2013-10-21 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800676#comment-13800676 ] kiran commented on NUTCH-1478: -- This plugin is not up to date with the patch at NUTCH-1467

[jira] [Commented] (NUTCH-1560) index-metadata to add all values of multivalued metadata

2013-06-12 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681699#comment-13681699 ] kiran commented on NUTCH-1560: -- Sebastian, Do you want to commit this for 1.7, so we can

[jira] [Commented] (NUTCH-1579) NPE when using solr indexing

2013-05-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671677#comment-13671677 ] kiran commented on NUTCH-1579: -- Hi anandkumar, You should post any questions to

[jira] [Updated] (NUTCH-1561) improve usability of parse-metatags and index-metadata

2013-05-13 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1561: - Attachment: NUTCH-1561-v1.patch The patch fixes the two issues raised above i) The property metatags.names uses

[jira] [Commented] (NUTCH-1560) index-metadata to add all values of multivalued metadata

2013-05-12 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655618#comment-13655618 ] kiran commented on NUTCH-1560: -- My bad. My previous versions of the patches for NUTCH-1467

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-05-12 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Attachment: NUTCH-1467-trunk-v3.patch The latest patch updated with the unit test provided by Sebastian and

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-05-12 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655637#comment-13655637 ] kiran commented on NUTCH-1467: -- The latest version (v3) should be used along with NUTCH-1560

[jira] [Commented] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed

2013-05-11 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655275#comment-13655275 ] kiran commented on NUTCH-585: - Hi Tomic, If you are using SVN, please see here for

[jira] [Commented] (NUTCH-1560) index-metadata to add all values of multivalued metadata

2013-04-17 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634486#comment-13634486 ] kiran commented on NUTCH-1560: -- Hi Sebastian, Thanks for the patch. Last time I used

[jira] [Commented] (NUTCH-1406) index-metadata plugin: conversion to Solr date format

2013-03-21 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609430#comment-13609430 ] kiran commented on NUTCH-1406: -- Hi Kristof, Are there any updates or test for this patch ?

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

2013-03-06 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595284#comment-13595284 ] kiran commented on NUTCH-1541: -- Great! I will give it a try sometime soon this week.

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2013-03-04 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592179#comment-13592179 ] kiran commented on NUTCH-961: - No Roland, not yet. I just switched to using 1.x series, but i

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-03-03 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591860#comment-13591860 ] kiran commented on NUTCH-1467: -- Sebastian, I got what you mean. I am uploading the patch

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-03-03 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Attachment: NUTCH-1467-trunk_v2.patch This patch is updated with Sebastian suggestions above.

[jira] [Comment Edited] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-03-03 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591860#comment-13591860 ] kiran edited comment on NUTCH-1467 at 3/3/13 8:43 PM: -- Sebastian, I

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-03-02 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591547#comment-13591547 ] kiran commented on NUTCH-1467: -- Sebastian, I am working on your suggestions and i have a

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-03-01 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591123#comment-13591123 ] kiran commented on NUTCH-1467: -- Hi Sebastian, Thanks for reminding! I forgot about this

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2013-03-01 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Attachment: NUTCH-1467-trunk_v1.patch The pervious patch with HTMLMetaProcessor in tika modified.

[jira] [Commented] (NUTCH-1537) Legacy metadata package needs to take advantage of Apache Tika metadata package more.

2013-02-28 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589620#comment-13589620 ] kiran commented on NUTCH-1537: -- Hi Lewis, Do you mean we need to take advantage in defining

[jira] [Commented] (NUTCH-874) Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora

2013-02-28 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590246#comment-13590246 ] kiran commented on NUTCH-874: - The following plugins need to be ported for compatibility in 2.x

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2013-02-19 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581459#comment-13581459 ] kiran commented on NUTCH-961: - Markus, do you think this patch can also work for 2.x Series ?

[jira] [Created] (NUTCH-1524) Internal links are not being saved even with change in parameter (db.ignore.internal.links)

2013-01-24 Thread kiran (JIRA)
kiran created NUTCH-1524: Summary: Internal links are not being saved even with change in parameter (db.ignore.internal.links) Key: NUTCH-1524 URL: https://issues.apache.org/jira/browse/NUTCH-1524 Project:

[jira] [Commented] (NUTCH-1511) Metadata in MYSQL updated with 'garbage'

2013-01-01 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541835#comment-13541835 ] kiran commented on NUTCH-1511: -- Hi, Just out of curiosity, did you try indexing the fields

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1478: - Attachment: metadata_parseChecker_sites.png This is a screenshot of how my parsechecker is working after i

[jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428 ] kiran commented on NUTCH-1478: -- Hi Jaap, Parsechecker should work if the field

[jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428 ] kiran edited comment on NUTCH-1478 at 12/31/12 5:39 PM: Hi Jaap,

[jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428 ] kiran edited comment on NUTCH-1478 at 12/31/12 5:41 PM: Hi Jaap,

[jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428 ] kiran edited comment on NUTCH-1478 at 12/31/12 5:43 PM: Hi Jaap,

[jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-31 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541472#comment-13541472 ] kiran commented on NUTCH-1478: -- I think this is a problem with parsechecker in 2.x. Only the

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-30 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1478: - Description: I have ported parse-metatags and index-metadata plugin to Nutch 2.x series. This will take

[jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-12-30 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541243#comment-13541243 ] kiran commented on NUTCH-1478: -- Hi Gobel, I have updated the broken

[jira] [Commented] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again

2012-12-06 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511539#comment-13511539 ] kiran commented on NUTCH-1245: -- Can 2.x version be affected by the same issue ? I am

[jira] [Created] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse

2012-11-01 Thread kiran (JIRA)
kiran created NUTCH-1487: Summary: Nutch parse fails first time for PDF files and works on reparse Key: NUTCH-1487 URL: https://issues.apache.org/jira/browse/NUTCH-1487 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-10-23 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482668#comment-13482668 ] kiran commented on NUTCH-1467: -- Hi Sebastian, Thank you for the suggestions. I will look in

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-10-19 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1478: - Attachment: Nutch1478.zip Parse-metatags and index-metadata plugin for Nutch 2.x series

[jira] [Created] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-10-18 Thread kiran (JIRA)
kiran created NUTCH-1478: Summary: Parse-metatags and index-metadata plugin for Nutch 2.x series Key: NUTCH-1478 URL: https://issues.apache.org/jira/browse/NUTCH-1478 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-10-18 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1478: - Attachment: Nutch1478.patch parseMetatags-2.x.zip unzip the zip folder in src/plugins in Nutch

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-10-18 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1478: - Description: I have ported parse-metatags and index-metadata plugin to Nutch 2.x series. This will take

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2012-10-18 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1478: - Attachment: (was: parseMetatags-2.x.zip) Parse-metatags and index-metadata plugin for Nutch 2.x series

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-10-15 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476557#comment-13476557 ] kiran commented on NUTCH-1433: -- Tika 2.x is not able to parse pdf files. 'hadoop.log' gives

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-10-02 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467598#comment-13467598 ] kiran commented on NUTCH-1467: -- Thank you for the unified patch. I did not know much about

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-10-01 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Attachment: Patch_MetaTagsParser.patch Patch_MetadataIndexer.patch

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-13 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454847#comment-13454847 ] kiran commented on NUTCH-1467: -- Hi Julien, Thank you for your suggestions. I will look in to

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-12 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454296#comment-13454296 ] kiran commented on NUTCH-1467: -- Yes, it would be great to store the multiValues as array

[jira] [Resolved] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-11 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran resolved NUTCH-1467. -- Resolution: Implemented I have made a patch file (attached below) which will solve the above problem. I do

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-11 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Attachment: patch.txt nutch 1.5.1 not able to parse mutliValued metatags

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-11 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453417#comment-13453417 ] kiran commented on NUTCH-1467: -- Thank you for fixing it in the version 1.6

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-11 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Description: Hi, I have been able to parse metatags in an html page using

[jira] [Updated] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-11 Thread kiran (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kiran updated NUTCH-1467: - Description: Hi, I have been able to parse metatags in an html page using

[jira] [Created] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-06 Thread kiran (JIRA)
kiran created NUTCH-1467: Summary: nutch 1.5.1 not able to parse mutliValued metatags Key: NUTCH-1467 URL: https://issues.apache.org/jira/browse/NUTCH-1467 Project: Nutch Issue Type: Bug Affects