Hi,
I still failed upgrading nutch 2.1 with Tika 1.2 :(
I followed to copy as mentioned on NUTCH-1433 patch, execute "ant runtime".
But too many errors!
========================================
.... part of:
[javac]
/home/bayu/Downloads/solr/apache-nutch-2.1/src/java/org/apache/nutch/util/PrefixStringMatcher.java:50:
warning: [rawtypes] found raw type: Iterator
[javac] Iterator iter= prefixes.iterator();
[javac] ^
[javac] missing type arguments for generic class Iterator<E>
[javac] where E is a type-variable:
[javac] E extends Object declared in interface Iterator
[javac]
/home/bayu/Downloads/solr/apache-nutch-2.1/src/java/org/apache/nutch/util/SuffixStringMatcher.java:44:
warning: [rawtypes] found raw type: Collection
[javac] public SuffixStringMatcher(Collection suffixes) {
[javac] ^
[javac] missing type arguments for generic class Collection<E>
[javac] where E is a type-variable:
[javac] E extends Object declared in interface Collection
[javac]
/home/bayu/Downloads/solr/apache-nutch-2.1/src/java/org/apache/nutch/util/SuffixStringMatcher.java:46:
warning: [rawtypes] found raw type: Iterator
[javac] Iterator iter= suffixes.iterator();
[javac] ^
[javac] missing type arguments for generic class Iterator<E>
[javac] where E is a type-variable:
[javac] E extends Object declared in interface Iterator
[javac]
/home/bayu/Downloads/solr/apache-nutch-2.1/src/java/org/apache/nutch/util/ToolUtil.java:48:
warning: [unchecked] unchecked cast
[javac] Map<String,Object> jobs =
(Map<String,Object>)results.get(Nutch.STAT_JOBS);
[javac] ^
[javac] required: Map<String,Object>
[javac] found: Object
[javac] 100 errors
[javac] 52 warnings
BUILD FAILED
/home/bayu/Downloads/solr/apache-nutch-2.1/build.xml:97: Compile failed;
see the compiler error output for details.
Total time: 18 seconds
========================================
Anyone can give me a hint?
In parallel I changed to use nutch 1.6 binary and works well.
But curious to use the latest of nutch 2.1.
Thanks in advance!
On Sun, Dec 30, 2012 at 1:46 PM, Bayu Widyasanyata
<[email protected]>wrote:
> Hi,
>
> Thank you for suggestions.
> And I was try to upgrade the Tika to 1.2 as mentioned on
> https://issues.apache.org/jira/browse/NUTCH-1433
>
> I will try your suggestions and/or upgrade tika.
>
> On Sun, Dec 30, 2012 at 6:07 AM, Dave Meikle <[email protected]> wrote:
> > Hi,
> >
> > Tika should parse those formats, so unless there is something peculiar
> > with all your files or setup, have you tried the:
> >
> > - Size of the files to see if they are over configured limits
> > - used the nutch parsechecker command to test individual files
> >
> > Cheers,
> > Dave
> >
> > On 25 Dec 2012, at 01:34, Bayu Widyasanyata <[email protected]>
> wrote:
> >
> >> Hi,
> >>
> >> ==Update==
> >>
> >> Checking hadoop.log found some interesting info that the parsing was
> >> not completed successfully.
> >>
> >> ...
> >> 2012-12-25 08:15:09,480 INFO parse.ParserJob - Parsing
> >> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
> >> 2012-12-25 08:15:09,480 INFO parse.ParserFactory - The parsing
> >> plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the
> >> plugin.includes system property, and all claim to support the content
> >> type application/vnd.oasis.opendocument.text, but they are not mapped
> >> to it in the parse-plugins.xml file
> >> 2012-12-25 08:15:09,517 WARN parse.ParseUtil - Unable to successfully
> >> parse content
> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
> >> of type application/vnd.oasis.opendocument.text
> >> 2012-12-25 08:15:09,520 INFO parse.ParserJob - Parsing
> >> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
> >> 2012-12-25 08:15:09,521 INFO parse.ParserFactory - The parsing
> >> plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the
> >> plugin.includes system property, and all claim to support the content
> >> type application/pdf, but they are not mapped to it in the
> >> parse-plugins.xml file
> >> 2012-12-25 08:15:09,545 WARN parse.ParseUtil - Unable to successfully
> >> parse content
> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
> >> of type application/pdf
> >> 2012-12-25 08:15:09,551 INFO parse.ParserJob - Parsing
> >> http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
> >> 2012-12-25 08:15:09,560 WARN parse.ParseUtil - Unable to successfully
> >> parse content
> http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
> >> of type application/vnd.oasis.opendocument.text
> >> 2012-12-25 08:15:09,563 INFO parse.ParserJob - Parsing
> >> http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
> >> 2012-12-25 08:15:09,590 WARN parse.ParseUtil - Unable to successfully
> >> parse content
> http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
> >> of type application/pdf
> >> 2012-12-25 08:15:09,597 INFO parse.ParserJob - Parsing
> >>
> http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
> >> 2012-12-25 08:15:09,652 WARN parse.ParseUtil - Unable to successfully
> >> parse content
> http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
> >> of type application/pdf
> >> ...
> >>
> >> I checked the parse-plugins.xml file and found no plugins handling
> >> type of application/pdf and application/vnd.oasis.opendocument.text.
> >> I knew that parse-tika handle PDF files but why those errors were still
> occurs?
> >>
> >> Any documents/links could explain in easy way to install and activate
> >> those supported plugins as mentioned at [1] on nutch parser?
> >>
> >> [1] http://tika.apache.org/1.2/formats.html#Portable_Document_Format
> >>
> >> Thanks,
> >>
> >> --
> >> wassalam,
> >> [bayu]
>
>
>
> --
> wassalam,
> [bayu]
>
--
wassalam,
[bayu]