Thanks, Andrzej!
>-----Original Message-----
>From: Andrzej Bialecki [mailto:[email protected]]
>Sent: Friday, June 25, 2010 3:41 AM
>To: [email protected]
>Subject: Re: Parsing PostScript files
>
>On 2010-06-24 10:56, [email protected] wrote:
>> Hi,
>>
>> It looks like Tika does not include a PostScript parser. At least the
>> copy that comes with Nutch 1.1. Is this right? I just want to double
>> check because PostScript is a major file format. I get errors "Can't
>> retrieve Tika parser for mime-type application/postscript" in the log
>> when Nutch comes across a PostScript file. I've found a reference to
>> parser-pdf associated with PostScript, but it does not work any
>> better. It tries to treat PostScript files as pdf and fails, if I
>> correctly interpret its complains.
>
>PDF parser can't properly parse Postscript, sorry. On the other hand,
>Postscript parsers may be (and often are) able to parse PDF-s.
>
>>
>> Could anyone help with parsing PostScript in Nutch, please? It is
>> hard to believe that this is not implemented.
>
>You can use Ghostscript via the parse-ext plugin - see examples in
>plugin.xml file there.
>
>(...and BTW, parsing Postscript is definitely not on the same level of
>complexity as parsing PDF - Postscript is a full programming language,
>whereas PDF is "just" a page description format).
>
>--
>Best regards,
>Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
>[__ || __|__/|__||\/| Information Retrieval, Semantic Web
>___|||__|| \| || | Embedded Unix, System Integration
>http://www.sigram.com Contact: info at sigram dot com