[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adnane B. updated NUTCH-:
-
Description:
This problem happens at the the second time I crawl a page
bin/nutch inject urls/
bin/nutch
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adnane B. updated NUTCH-:
-
Description:
This problem happens at the the second time a crawl a page
bin/nutch inject urls/
bin/nutch
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adnane B. updated NUTCH-:
-
Description:
This problem happens at the the second time a crawl a page
bin/nutch inject urls/
bin/nutch
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adnane B. updated NUTCH-:
-
Description:
This problem happens at the the second time a crawl a page
bin/nutch inject urls/
bin/nutch
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adnane B. updated NUTCH-:
-
Summary: fetch deletes all metadata except _csh_ and _rs_ (was: updatedb
deletes all metadata except
[
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adnane B. updated NUTCH-:
-
Description:
This problem happens at the the second update on a crawled a page ** that has
not changed**
Adnane B. created NUTCH-:
Summary: updatedb deletes all metadata except _csh_ and _rs_
Key: NUTCH-
URL: https://issues.apache.org/jira/browse/NUTCH-
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148758#comment-15148758
]
Hudson commented on NUTCH-961:
--
SUCCESS: Integrated in Nutch-trunk #3347 (See
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-961.
-
Resolution: Fixed
Committed to trunk in revision 1730694. Thanks everyone for contributions.
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Updated patch. ExtractorRepository was missing.
> Expose Tika's
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: 1.12
> Expose Tika's boilerpipe support
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Affects Version/s: 1.11
> Expose Tika's boilerpipe support
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148642#comment-15148642
]
Markus Jelsma commented on NUTCH-961:
-
Tests pass as expected and Boilerpipe as well. Will commit
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148631#comment-15148631
]
Hudson commented on NUTCH-2210:
---
SUCCESS: Integrated in Nutch-trunk #3346 (See
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Patch for trunk.
> Expose Tika's boilerpipe support
>
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148632#comment-15148632
]
Hudson commented on NUTCH-1233:
---
SUCCESS: Integrated in Nutch-trunk #3346 (See
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1233.
--
Resolution: Fixed
Committed to trunk in revision 1730687.
> Rely on Tika for outlink
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Affects Version/s: 1.11
> Rely on Tika for outlink extraction
>
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Fix Version/s: 1.12
> Rely on Tika for outlink extraction
> ---
>
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Component/s: parser
> Rely on Tika for outlink extraction
> ---
>
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148590#comment-15148590
]
Markus Jelsma commented on NUTCH-1233:
--
Awesome! Everything works as expected since the Tika 1.12
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2210.
--
Resolution: Fixed
Committed to trunk in revision 1730686.
> Upgrade to Tika 1.12
>
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148572#comment-15148572
]
Markus Jelsma commented on NUTCH-2210:
--
Test passes, will commit shortly.
> Upgrade to Tika 1.12
>
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2210:
-
Attachment: NUTCH-2210.patch
Patch for trunk.
> Upgrade to Tika 1.12
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148489#comment-15148489
]
Markus Jelsma commented on NUTCH-2197:
--
Hello Arun - no, this is not applied to 2.3.1. The plugins
27 matches
Mail list logo