Hi,

Thank you for suggestions.
And I was try to upgrade the Tika to 1.2 as mentioned on
https://issues.apache.org/jira/browse/NUTCH-1433

I will try your suggestions and/or upgrade tika.

On Sun, Dec 30, 2012 at 6:07 AM, Dave Meikle <[email protected]> wrote:
> Hi,
>
> Tika should parse those formats, so unless there is something peculiar
> with all your files or setup, have you tried the:
>
> - Size of the files to see if they are over configured limits
> - used the nutch parsechecker command to test individual files
>
> Cheers,
> Dave
>
> On 25 Dec 2012, at 01:34, Bayu Widyasanyata <[email protected]> wrote:
>
>> Hi,
>>
>> ==Update==
>>
>> Checking hadoop.log found some interesting info that the parsing was
>> not completed successfully.
>>
>> ...
>> 2012-12-25 08:15:09,480 INFO  parse.ParserJob - Parsing
>> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
>> 2012-12-25 08:15:09,480 INFO  parse.ParserFactory - The parsing
>> plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the
>> plugin.includes system property, and all claim to support the content
>> type application/vnd.oasis.opendocument.text, but they are not mapped
>> to it  in the parse-plugins.xml file
>> 2012-12-25 08:15:09,517 WARN  parse.ParseUtil - Unable to successfully
>> parse content 
>> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt
>> of type application/vnd.oasis.opendocument.text
>> 2012-12-25 08:15:09,520 INFO  parse.ParserJob - Parsing
>> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
>> 2012-12-25 08:15:09,521 INFO  parse.ParserFactory - The parsing
>> plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the
>> plugin.includes system property, and all claim to support the content
>> type application/pdf, but they are not mapped to it  in the
>> parse-plugins.xml file
>> 2012-12-25 08:15:09,545 WARN  parse.ParseUtil - Unable to successfully
>> parse content 
>> http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
>> of type application/pdf
>> 2012-12-25 08:15:09,551 INFO  parse.ParserJob - Parsing
>> http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
>> 2012-12-25 08:15:09,560 WARN  parse.ParseUtil - Unable to successfully
>> parse content http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt
>> of type application/vnd.oasis.opendocument.text
>> 2012-12-25 08:15:09,563 INFO  parse.ParserJob - Parsing
>> http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
>> 2012-12-25 08:15:09,590 WARN  parse.ParseUtil - Unable to successfully
>> parse content 
>> http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf
>> of type application/pdf
>> 2012-12-25 08:15:09,597 INFO  parse.ParserJob - Parsing
>> http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
>> 2012-12-25 08:15:09,652 WARN  parse.ParseUtil - Unable to successfully
>> parse content 
>> http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf
>> of type application/pdf
>> ...
>>
>> I checked the parse-plugins.xml file and found no plugins handling
>> type of application/pdf and application/vnd.oasis.opendocument.text.
>> I knew that parse-tika handle PDF files but why those errors were still 
>> occurs?
>>
>> Any documents/links could explain in easy way to install and activate
>> those supported plugins as mentioned at [1] on nutch parser?
>>
>> [1] http://tika.apache.org/1.2/formats.html#Portable_Document_Format
>>
>> Thanks,
>>
>> --
>> wassalam,
>> [bayu]



-- 
wassalam,
[bayu]

Reply via email to