[jira] Commented: (NUTCH-949) Conflicting ANT jars in classpath

2010-12-22 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974228#action_12974228
 ] 

Julien Nioche commented on NUTCH-949:
-

branch 1.3 : Committed revision 1051932

 Conflicting ANT jars in classpath 
 --

 Key: NUTCH-949
 URL: https://issues.apache.org/jira/browse/NUTCH-949
 Project: Nutch
  Issue Type: Bug
  Components: build
Affects Versions: 1.3, 2.0
Reporter: Julien Nioche
Assignee: Julien Nioche

 When the locally installed version of ANT  1.7.1 the test-plugins task 
 crashes because of a conflict between the versions of  the ANT jars that can 
 be found in the classpath. 
 This is due to Avro being referenced in the ivy.xml file despite the fact 
 that it is not needed in 1.3
 Will commit the change shortly; will also check whether this is an issue for 
 2.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-12-22 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche closed NUTCH-936.
---

Resolution: Fixed

Committed in trunk under revision 1051985.
Thanks

 LanguageIdentifier should not set empty lang field on NutchDocument
 ---

 Key: NUTCH-936
 URL: https://issues.apache.org/jira/browse/NUTCH-936
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.2
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
 Fix For: 1.3, 2.0

 Attachments: NUTCH-936-v12-1.patch, NUTCH-936-v13-1.patch, 
 NUTCH-936-v13-1.patch


 For some reason the language identifier plugin sometimes sets an empty value 
 for the lang field. It is confirmed to occur in 1.2 when parsing a scanned 
 PDF file which cannot be OCR'd to proper text, resulting in an empty content 
 field. Anyway, whether it's a problem with the parser or not, the plugin 
 itself should not add an empty value because the content field can always be 
 empty. The plugin already checks for a null value and then sets the lang 
 field to `unknown`, which is fine. But when the lang string is empty, it 
 should also be set to `unknown`.
 This might break clients that have conditional logic on the empty value, but 
 not on the `unknown` value because it may never have occurred in their set up 
 and therefore they might not have added `unknown` to their logic. However, it 
 might seem a little bit overkill to put this proposal behind a configuration 
 option and let Nutch by default continue to behave as it currently does. Any 
 thoughts on this one?
 Here's the troublesome URL : 
 http://www.nrc.nl/redactie/binnenland/memo_buza_irak.pdf that returns an 
 empty content field and an empty lang string in 1.2 and presumably in trunk 
 and other versions as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Nutch-trunk #1345

2010-12-22 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Nutch-trunk/1345/changes

Changes:

[jnioche] NUTCH-936 LanguageIdentifier should not set empty lang field on 
NutchDocument (Markus Jelsma via jnioche)

[jnioche] NUTCH-949 Conflicting ANT jars in classpath

--
[...truncated 1016 lines...]
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AUsrc/plugin/urlnormalizer-pass/build.xml
A src/plugin/parse-html
A src/plugin/parse-html/ivy.xml
A src/plugin/parse-html/lib
A src/plugin/parse-html/lib/tagsoup.LICENSE.txt
A src/plugin/parse-html/src
A src/plugin/parse-html/src/test
A src/plugin/parse-html/src/test/org
A src/plugin/parse-html/src/test/org/apache
A src/plugin/parse-html/src/test/org/apache/nutch
A src/plugin/parse-html/src/test/org/apache/nutch/parse
A src/plugin/parse-html/src/test/org/apache/nutch/parse/html
A 
src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestRobotsMetaProcessor.java
A 
src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
A src/plugin/parse-html/src/java
A src/plugin/parse-html/src/java/org
A src/plugin/parse-html/src/java/org/apache
A src/plugin/parse-html/src/java/org/apache/nutch
A src/plugin/parse-html/src/java/org/apache/nutch/parse
A