Hi, Tika should parse those formats, so unless there is something peculiar with all your files or setup, have you tried the:
- Size of the files to see if they are over configured limits - used the nutch parsechecker command to test individual files Cheers, Dave On 25 Dec 2012, at 01:34, Bayu Widyasanyata <[email protected]> wrote: > Hi, > > ==Update== > > Checking hadoop.log found some interesting info that the parsing was > not completed successfully. > > ... > 2012-12-25 08:15:09,480 INFO parse.ParserJob - Parsing > http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt > 2012-12-25 08:15:09,480 INFO parse.ParserFactory - The parsing > plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the > plugin.includes system property, and all claim to support the content > type application/vnd.oasis.opendocument.text, but they are not mapped > to it in the parse-plugins.xml file > 2012-12-25 08:15:09,517 WARN parse.ParseUtil - Unable to successfully > parse content > http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.odt > of type application/vnd.oasis.opendocument.text > 2012-12-25 08:15:09,520 INFO parse.ParserJob - Parsing > http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf > 2012-12-25 08:15:09,521 INFO parse.ParserFactory - The parsing > plugins: [org.apache.nutch.parse.tika.TikaParser] are enabled via the > plugin.includes system property, and all claim to support the content > type application/pdf, but they are not mapped to it in the > parse-plugins.xml file > 2012-12-25 08:15:09,545 WARN parse.ParseUtil - Unable to successfully > parse content > http://localhost/sapi/Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf > of type application/pdf > 2012-12-25 08:15:09,551 INFO parse.ParserJob - Parsing > http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt > 2012-12-25 08:15:09,560 WARN parse.ParseUtil - Unable to successfully > parse content http://localhost/sapi/Akhirat_Lebih_Utama_Daripada_Dunia.odt > of type application/vnd.oasis.opendocument.text > 2012-12-25 08:15:09,563 INFO parse.ParserJob - Parsing > http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf > 2012-12-25 08:15:09,590 WARN parse.ParseUtil - Unable to successfully > parse content > http://localhost/sapi/nospasi_Akhirat_Lebih_Utama_Daripada_Dunia.pdf > of type application/pdf > 2012-12-25 08:15:09,597 INFO parse.ParserJob - Parsing > http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf > 2012-12-25 08:15:09,652 WARN parse.ParseUtil - Unable to successfully > parse content > http://localhost/sapi/spasi%20Akhirat%20Lebih%20Utama%20Daripada%20Dunia.pdf > of type application/pdf > ... > > I checked the parse-plugins.xml file and found no plugins handling > type of application/pdf and application/vnd.oasis.opendocument.text. > I knew that parse-tika handle PDF files but why those errors were still > occurs? > > Any documents/links could explain in easy way to install and activate > those supported plugins as mentioned at [1] on nutch parser? > > [1] http://tika.apache.org/1.2/formats.html#Portable_Document_Format > > Thanks, > > -- > wassalam, > [bayu]

