[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-02-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915563#comment-13915563 ] Yasin Kılınç commented on NUTCH-1253: - I checked and tested patch file into 2.x

[jira] [Updated] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2014-02-28 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER updated NUTCH-1478: Attachment: NUTCH-1478v5.patch I fixed several mistakes within the patch. This is final.

[jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

2014-02-28 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915619#comment-13915619 ] Talat UYARER edited comment on NUTCH-1478 at 2/28/14 10:03 AM:

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-02-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915701#comment-13915701 ] Lewis John McGibbney commented on NUTCH-1253: - The version of nekohtml we are

[jira] [Updated] (NUTCH-1727) Configurable length for Tlds

2014-02-28 Thread Sertac TURKEL (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sertac TURKEL updated NUTCH-1727: - Attachment: (was: NUTCH-1727.patch) Configurable length for Tlds

[jira] [Commented] (NUTCH-1253) Incompatible neko and xerces versions

2014-02-28 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915730#comment-13915730 ] Yasin Kılınç commented on NUTCH-1253: - Ok. But there is a line in target eclipse

[jira] [Updated] (NUTCH-1727) Configurable length for Tlds

2014-02-28 Thread Sertac TURKEL (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sertac TURKEL updated NUTCH-1727: - Attachment: NUTCH-1727.patch Hi [~lewismc], there is a point that I missed. I found it and I

[jira] [Created] (NUTCH-1732) IndexerMapReduce to delete explicitly not indexable documents

2014-02-28 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1732: -- Summary: IndexerMapReduce to delete explicitly not indexable documents Key: NUTCH-1732 URL: https://issues.apache.org/jira/browse/NUTCH-1732 Project: Nutch

[jira] [Commented] (NUTCH-1732) IndexerMapReduce to delete explicitly not indexable documents

2014-02-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915828#comment-13915828 ] Markus Jelsma commented on NUTCH-1732: -- We have an explicit

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915831#comment-13915831 ] Sebastian Nagel commented on NUTCH-1113: Results of tests: The number of documents

[jira] [Comment Edited] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908265#comment-13908265 ] Sebastian Nagel edited comment on NUTCH-1113 at 2/28/14 2:45 PM:

HTTP Post request

2014-02-28 Thread Zabini
Hi, I would like to be able to send HTTP POST request for Nutch to crawl. I mean if I ever wanted to crawl a search result, I could do http://www.example.com/search?q=mySearch But if the server use HTTP post I have not found a way to do it. So what I wanted to do is from a conf file retrieve

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Attachment: NUTCH-1113-trunk-junit-final.patch Final patch including the stuff mentioned by

[jira] [Commented] (NUTCH-1732) IndexerMapReduce to delete explicitly not indexable documents

2014-02-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915889#comment-13915889 ] Sebastian Nagel commented on NUTCH-1732: Hi [~markus17], looks like a partial

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Fix Version/s: (was: 1.9) 1.8 Merging segments causes URLs to vanish

[jira] [Resolved] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1113. -- Resolution: Fixed Assignee: Markus Jelsma Committed revision 1572975. Thanks all for

Re: Nutch roadmap and documentation

2014-02-28 Thread Lewis John Mcgibbney
Hi Mateusz, On Thu, Feb 27, 2014 at 10:35 AM, Mateusz Zakarczemny mateusz.zakarcze...@up2data.pl wrote: Docs from 1 and 2 branch are mixed together. As far as I can see they are separate. The tutorials are clearly under different subsections, and the Nutch 2.x docs have their own section as

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915917#comment-13915917 ] Julien Nioche commented on NUTCH-1113: -- Well done, thanks guys! Merging segments

[jira] [Commented] (NUTCH-1706) IndexerMapReduce does not remove db_redir_temp etc

2014-02-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915919#comment-13915919 ] Sebastian Nagel commented on NUTCH-1706: Latest patch tested successfully (see

Build failed in Jenkins: Nutch-trunk #2545

2014-02-28 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2545/changes Changes: [markus] NUTCH-1113 SegmentMerger can now be safely used to merge segments. If this damn thing breaks again -- [...truncated 3001 lines...] [ivy:resolve] :: loading settings :: file

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915969#comment-13915969 ] Hudson commented on NUTCH-1113: --- FAILURE: Integrated in Nutch-trunk #2545 (See

Build failed in Jenkins: Nutch-trunk #2546

2014-02-28 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2546/ -- [...truncated 2159 lines...] copy-generated-lib: [copy] Copying 1 file to /home/hudson/jenkins-slave/workspace/Nutch-trunk/trunk/build/plugins/protocol-ftp init: [mkdir] Created dir: