[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800676#comment-13800676
]
kiran commented on NUTCH-1478:
--
This plugin is not up to date with the patch at NUTCH-1467
[
https://issues.apache.org/jira/browse/NUTCH-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681699#comment-13681699
]
kiran commented on NUTCH-1560:
--
Sebastian,
Do you want to commit this for 1.7, so we can
[
https://issues.apache.org/jira/browse/NUTCH-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671677#comment-13671677
]
kiran commented on NUTCH-1579:
--
Hi anandkumar,
You should post any questions to
[
https://issues.apache.org/jira/browse/NUTCH-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1561:
-
Attachment: NUTCH-1561-v1.patch
The patch fixes the two issues raised above
i) The property metatags.names uses
[
https://issues.apache.org/jira/browse/NUTCH-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655618#comment-13655618
]
kiran commented on NUTCH-1560:
--
My bad. My previous versions of the patches for NUTCH-1467
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Attachment: NUTCH-1467-trunk-v3.patch
The latest patch updated with the unit test provided by Sebastian and
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655637#comment-13655637
]
kiran commented on NUTCH-1467:
--
The latest version (v3) should be used along with NUTCH-1560
[
https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655275#comment-13655275
]
kiran commented on NUTCH-585:
-
Hi Tomic,
If you are using SVN, please see here for
[
https://issues.apache.org/jira/browse/NUTCH-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634486#comment-13634486
]
kiran commented on NUTCH-1560:
--
Hi Sebastian,
Thanks for the patch.
Last time I used
[
https://issues.apache.org/jira/browse/NUTCH-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609430#comment-13609430
]
kiran commented on NUTCH-1406:
--
Hi Kristof,
Are there any updates or test for this patch ?
[
https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595284#comment-13595284
]
kiran commented on NUTCH-1541:
--
Great! I will give it a try sometime soon this week.
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592179#comment-13592179
]
kiran commented on NUTCH-961:
-
No Roland, not yet. I just switched to using 1.x series, but i
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591860#comment-13591860
]
kiran commented on NUTCH-1467:
--
Sebastian,
I got what you mean. I am uploading the patch
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Attachment: NUTCH-1467-trunk_v2.patch
This patch is updated with Sebastian suggestions above.
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591860#comment-13591860
]
kiran edited comment on NUTCH-1467 at 3/3/13 8:43 PM:
--
Sebastian,
I
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591547#comment-13591547
]
kiran commented on NUTCH-1467:
--
Sebastian,
I am working on your suggestions and i have a
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591123#comment-13591123
]
kiran commented on NUTCH-1467:
--
Hi Sebastian,
Thanks for reminding! I forgot about this
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Attachment: NUTCH-1467-trunk_v1.patch
The pervious patch with HTMLMetaProcessor in tika modified.
[
https://issues.apache.org/jira/browse/NUTCH-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589620#comment-13589620
]
kiran commented on NUTCH-1537:
--
Hi Lewis,
Do you mean we need to take advantage in defining
[
https://issues.apache.org/jira/browse/NUTCH-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13590246#comment-13590246
]
kiran commented on NUTCH-874:
-
The following plugins need to be ported for compatibility in 2.x
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581459#comment-13581459
]
kiran commented on NUTCH-961:
-
Markus, do you think this patch can also work for 2.x Series ?
kiran created NUTCH-1524:
Summary: Internal links are not being saved even with change in
parameter (db.ignore.internal.links)
Key: NUTCH-1524
URL: https://issues.apache.org/jira/browse/NUTCH-1524
Project:
[
https://issues.apache.org/jira/browse/NUTCH-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541835#comment-13541835
]
kiran commented on NUTCH-1511:
--
Hi,
Just out of curiosity, did you try indexing the fields
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1478:
-
Attachment: metadata_parseChecker_sites.png
This is a screenshot of how my parsechecker is working after i
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428
]
kiran commented on NUTCH-1478:
--
Hi Jaap,
Parsechecker should work if the field
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428
]
kiran edited comment on NUTCH-1478 at 12/31/12 5:39 PM:
Hi Jaap,
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428
]
kiran edited comment on NUTCH-1478 at 12/31/12 5:41 PM:
Hi Jaap,
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541428#comment-13541428
]
kiran edited comment on NUTCH-1478 at 12/31/12 5:43 PM:
Hi Jaap,
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541472#comment-13541472
]
kiran commented on NUTCH-1478:
--
I think this is a problem with parsechecker in 2.x. Only the
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1478:
-
Description:
I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.
This will take
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541243#comment-13541243
]
kiran commented on NUTCH-1478:
--
Hi Gobel,
I have updated the broken
[
https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511539#comment-13511539
]
kiran commented on NUTCH-1245:
--
Can 2.x version be affected by the same issue ?
I am
kiran created NUTCH-1487:
Summary: Nutch parse fails first time for PDF files and works on
reparse
Key: NUTCH-1487
URL: https://issues.apache.org/jira/browse/NUTCH-1487
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482668#comment-13482668
]
kiran commented on NUTCH-1467:
--
Hi Sebastian,
Thank you for the suggestions. I will look in
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1478:
-
Attachment: Nutch1478.zip
Parse-metatags and index-metadata plugin for Nutch 2.x series
kiran created NUTCH-1478:
Summary: Parse-metatags and index-metadata plugin for Nutch 2.x
series
Key: NUTCH-1478
URL: https://issues.apache.org/jira/browse/NUTCH-1478
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1478:
-
Attachment: Nutch1478.patch
parseMetatags-2.x.zip
unzip the zip folder in src/plugins in Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1478:
-
Description:
I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.
This will take
[
https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1478:
-
Attachment: (was: parseMetatags-2.x.zip)
Parse-metatags and index-metadata plugin for Nutch 2.x series
[
https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476557#comment-13476557
]
kiran commented on NUTCH-1433:
--
Tika 2.x is not able to parse pdf files. 'hadoop.log' gives
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467598#comment-13467598
]
kiran commented on NUTCH-1467:
--
Thank you for the unified patch. I did not know much about
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Attachment: Patch_MetaTagsParser.patch
Patch_MetadataIndexer.patch
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454847#comment-13454847
]
kiran commented on NUTCH-1467:
--
Hi Julien,
Thank you for your suggestions. I will look in to
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454296#comment-13454296
]
kiran commented on NUTCH-1467:
--
Yes, it would be great to store the multiValues as array
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran resolved NUTCH-1467.
--
Resolution: Implemented
I have made a patch file (attached below) which will solve the above problem.
I do
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Attachment: patch.txt
nutch 1.5.1 not able to parse mutliValued metatags
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453417#comment-13453417
]
kiran commented on NUTCH-1467:
--
Thank you for fixing it in the version 1.6
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Description:
Hi,
I have been able to parse metatags in an html page using
[
https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kiran updated NUTCH-1467:
-
Description:
Hi,
I have been able to parse metatags in an html page using
kiran created NUTCH-1467:
Summary: nutch 1.5.1 not able to parse mutliValued metatags
Key: NUTCH-1467
URL: https://issues.apache.org/jira/browse/NUTCH-1467
Project: Nutch
Issue Type: Bug
Affects
50 matches
Mail list logo