Re: [jira] [Updated] (NUTCH-1644) Should have a parser that uses xpath

2014-11-04 Thread Albinscode
Hello Sebastian, I'll look at the xjb failure, so glad to see that it will be integrated into ivy! For the examples part, I normally added some commented tests in the tests folders. I'll look to provide a conf also if not already existing. I'll keep you in touch. Thanks, Albin 2014-11-03

Re: Patch reviews for 2.X

2014-11-04 Thread Sebastian Nagel
Hi Lewis, NUTCH-1825 (protocol-http may hang for certain web pages) - I'm running tests in production since one week (with 1.x) I'll check for any regressions in detail and will commit the next days. I'll also in the process of committing NUTCH-1483 Can't crawl filesystem with

[jira] [Commented] (NUTCH-1870) Generic xsl parser plugin

2014-11-04 Thread Albinscode (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196706#comment-14196706 ] Albinscode commented on NUTCH-1870: --- @Sebastian the version of the jaxb implementation

[jira] [Updated] (NUTCH-1870) Generic xsl parser plugin

2014-11-04 Thread Albinscode (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albinscode updated NUTCH-1870: -- Attachment: nutch-site.xml Sample conf file. Generic xsl parser plugin -

[jira] [Comment Edited] (NUTCH-1870) Generic xsl parser plugin

2014-11-04 Thread Albinscode (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196706#comment-14196706 ] Albinscode edited comment on NUTCH-1870 at 11/4/14 7:56 PM:

[jira] [Comment Edited] (NUTCH-1870) Generic xsl parser plugin

2014-11-04 Thread Albinscode (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196706#comment-14196706 ] Albinscode edited comment on NUTCH-1870 at 11/4/14 7:55 PM:

[jira] [Updated] (NUTCH-1885) Protocol-file should treat symbolic links as redirects

2014-11-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1885: --- Attachment: NUTCH-1885-2x-v1.patch equivalent patch for 2.x Protocol-file should treat

[jira] [Resolved] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2014-11-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1483. Resolution: Fixed Fix Version/s: (was: 2.4) 2.3 Committed

[jira] [Resolved] (NUTCH-1879) Regex URL normalizer should remove multiple slashes after file: protocol

2014-11-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1879. Resolution: Fixed Fix Version/s: (was: 2.4) 2.3 Fixed, see

[jira] [Resolved] (NUTCH-1878) urlnormalizer-regex to keep third slash in file:///path/index.html

2014-11-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1878. Resolution: Won't Fix Solution from NUTCH-1879 is preferred because URL.toString() returns

[jira] [Resolved] (NUTCH-1880) URLUtil should not add additional slashes for file URLs

2014-11-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1880. Resolution: Fixed Fix Version/s: (was: 2.4) 2.3 Committed,

[jira] [Resolved] (NUTCH-1885) Protocol-file should treat symbolic links as redirects

2014-11-04 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1885. Resolution: Fixed Fix Version/s: (was: 2.4) 2.3 Committed to

[jira] [Commented] (NUTCH-1880) URLUtil should not add additional slashes for file URLs

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196907#comment-14196907 ] Hudson commented on NUTCH-1880: --- SUCCESS: Integrated in Nutch-nutchgora #1219 (See

[jira] [Commented] (NUTCH-1885) Protocol-file should treat symbolic links as redirects

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196906#comment-14196906 ] Hudson commented on NUTCH-1885: --- SUCCESS: Integrated in Nutch-nutchgora #1219 (See

[jira] [Commented] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196905#comment-14196905 ] Hudson commented on NUTCH-1483: --- SUCCESS: Integrated in Nutch-nutchgora #1219 (See

[jira] [Commented] (NUTCH-1879) Regex URL normalizer should remove multiple slashes after file: protocol

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196904#comment-14196904 ] Hudson commented on NUTCH-1879: --- SUCCESS: Integrated in Nutch-nutchgora #1219 (See

[jira] [Commented] (NUTCH-1879) Regex URL normalizer should remove multiple slashes after file: protocol

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196915#comment-14196915 ] Hudson commented on NUTCH-1879: --- SUCCESS: Integrated in Nutch-trunk #2848 (See

[jira] [Commented] (NUTCH-1880) URLUtil should not add additional slashes for file URLs

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196918#comment-14196918 ] Hudson commented on NUTCH-1880: --- SUCCESS: Integrated in Nutch-trunk #2848 (See

[jira] [Commented] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196916#comment-14196916 ] Hudson commented on NUTCH-1483: --- SUCCESS: Integrated in Nutch-trunk #2848 (See

[jira] [Commented] (NUTCH-1885) Protocol-file should treat symbolic links as redirects

2014-11-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196917#comment-14196917 ] Hudson commented on NUTCH-1885: --- SUCCESS: Integrated in Nutch-trunk #2848 (See

Nutch 2.X question

2014-11-04 Thread amit sehas
I have a small question about Nutch 2.X source code, i hope this is the right mailing list for that. i was unable to locate the following pieces from the code: a) where does the linkdb get generated, which java file contains the code for that b) i see the WebPage class being utilized for