Hi,

I'm currently in the process of moving a system over to use ForkParser with Tika 2.3.0 but there's some issues I'm having.

First, I'd hoped to use the 'ForkParser(Path tikaBin, ParserFactoryFactory factoryFactory)' constructor to get better isolation, but have run into the issue described in TIKA-3223 where it can't find an exception class, for example if parsing an encrypted document. For now I've switched to using a 'legacy' constructor but it would be nice to eventually move to the newer method.

Second, there seems to be some work missing in the handling of metadata from certain parsers when using ForkParser. For example, for OpenDocument ODP and ODS files and Microsoft Open XML formats, while the document text is returned there is no metadata in either the returned Metadata object or in the returned HTML head. The OpenDocument ODT format works as expected via ForkParser though.

For an audio/mp4 file, the title is returned but the rest of the metadata is missing, although the values are present in the body of the returned HTML. For an video/mp4 file, metadata values are only present in the body of the HTML and in the Metadata object it has an incorrect video/quicktime content type.

If it's possible to squeeze fixing this second issue into the next version it would be really helpful!

Thanks,
Stephen.

Reply via email to