[jira] Commented: (NUTCH-949) Conflicting ANT jars in classpath
[ https://issues.apache.org/jira/browse/NUTCH-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974228#action_12974228 ] Julien Nioche commented on NUTCH-949: - branch 1.3 : Committed revision 1051932 Conflicting ANT jars in classpath -- Key: NUTCH-949 URL: https://issues.apache.org/jira/browse/NUTCH-949 Project: Nutch Issue Type: Bug Components: build Affects Versions: 1.3, 2.0 Reporter: Julien Nioche Assignee: Julien Nioche When the locally installed version of ANT 1.7.1 the test-plugins task crashes because of a conflict between the versions of the ANT jars that can be found in the classpath. This is due to Avro being referenced in the ivy.xml file despite the fact that it is not needed in 1.3 Will commit the change shortly; will also check whether this is an issue for 2.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-936. --- Resolution: Fixed Committed in trunk under revision 1051985. Thanks LanguageIdentifier should not set empty lang field on NutchDocument --- Key: NUTCH-936 URL: https://issues.apache.org/jira/browse/NUTCH-936 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.2 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.3, 2.0 Attachments: NUTCH-936-v12-1.patch, NUTCH-936-v13-1.patch, NUTCH-936-v13-1.patch For some reason the language identifier plugin sometimes sets an empty value for the lang field. It is confirmed to occur in 1.2 when parsing a scanned PDF file which cannot be OCR'd to proper text, resulting in an empty content field. Anyway, whether it's a problem with the parser or not, the plugin itself should not add an empty value because the content field can always be empty. The plugin already checks for a null value and then sets the lang field to `unknown`, which is fine. But when the lang string is empty, it should also be set to `unknown`. This might break clients that have conditional logic on the empty value, but not on the `unknown` value because it may never have occurred in their set up and therefore they might not have added `unknown` to their logic. However, it might seem a little bit overkill to put this proposal behind a configuration option and let Nutch by default continue to behave as it currently does. Any thoughts on this one? Here's the troublesome URL : http://www.nrc.nl/redactie/binnenland/memo_buza_irak.pdf that returns an empty content field and an empty lang string in 1.2 and presumably in trunk and other versions as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Nutch-trunk #1345
See https://hudson.apache.org/hudson/job/Nutch-trunk/1345/changes Changes: [jnioche] NUTCH-936 LanguageIdentifier should not set empty lang field on NutchDocument (Markus Jelsma via jnioche) [jnioche] NUTCH-949 Conflicting ANT jars in classpath -- [...truncated 1016 lines...] A src/plugin/subcollection/build.xml A src/plugin/index-more A src/plugin/index-more/ivy.xml A src/plugin/index-more/src A src/plugin/index-more/src/test A src/plugin/index-more/src/test/org A src/plugin/index-more/src/test/org/apache A src/plugin/index-more/src/test/org/apache/nutch A src/plugin/index-more/src/test/org/apache/nutch/indexer A src/plugin/index-more/src/test/org/apache/nutch/indexer/more A src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java A src/plugin/index-more/src/java A src/plugin/index-more/src/java/org A src/plugin/index-more/src/java/org/apache A src/plugin/index-more/src/java/org/apache/nutch A src/plugin/index-more/src/java/org/apache/nutch/indexer A src/plugin/index-more/src/java/org/apache/nutch/indexer/more A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html A src/plugin/index-more/plugin.xml A src/plugin/index-more/build.xml AUsrc/plugin/plugin.dtd A src/plugin/parse-ext A src/plugin/parse-ext/ivy.xml A src/plugin/parse-ext/src A src/plugin/parse-ext/src/test A src/plugin/parse-ext/src/test/org A src/plugin/parse-ext/src/test/org/apache A src/plugin/parse-ext/src/test/org/apache/nutch A src/plugin/parse-ext/src/test/org/apache/nutch/parse A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java A src/plugin/parse-ext/src/java A src/plugin/parse-ext/src/java/org A src/plugin/parse-ext/src/java/org/apache A src/plugin/parse-ext/src/java/org/apache/nutch A src/plugin/parse-ext/src/java/org/apache/nutch/parse A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java A src/plugin/parse-ext/plugin.xml A src/plugin/parse-ext/build.xml A src/plugin/parse-ext/command A src/plugin/urlnormalizer-pass A src/plugin/urlnormalizer-pass/ivy.xml A src/plugin/urlnormalizer-pass/src A src/plugin/urlnormalizer-pass/src/test A src/plugin/urlnormalizer-pass/src/test/org A src/plugin/urlnormalizer-pass/src/test/org/apache A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java A src/plugin/urlnormalizer-pass/src/java A src/plugin/urlnormalizer-pass/src/java/org A src/plugin/urlnormalizer-pass/src/java/org/apache A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java AUsrc/plugin/urlnormalizer-pass/plugin.xml AUsrc/plugin/urlnormalizer-pass/build.xml A src/plugin/parse-html A src/plugin/parse-html/ivy.xml A src/plugin/parse-html/lib A src/plugin/parse-html/lib/tagsoup.LICENSE.txt A src/plugin/parse-html/src A src/plugin/parse-html/src/test A src/plugin/parse-html/src/test/org A src/plugin/parse-html/src/test/org/apache A src/plugin/parse-html/src/test/org/apache/nutch A src/plugin/parse-html/src/test/org/apache/nutch/parse A src/plugin/parse-html/src/test/org/apache/nutch/parse/html A src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestRobotsMetaProcessor.java A src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java A src/plugin/parse-html/src/java A src/plugin/parse-html/src/java/org A src/plugin/parse-html/src/java/org/apache A src/plugin/parse-html/src/java/org/apache/nutch A src/plugin/parse-html/src/java/org/apache/nutch/parse A