Hello Christian,

catdoc is not capable of processing new office formats. As far as I know
there is no intention for catdoc to implement this in a foreseeable
future. The same problem exists for xls2csv. You could theoretically try
to call unoconv (https://github.com/unoconv/unoconv) before catdoc, but
it will probably have a big performance impact since it launches libre
office / open office for the conversion. But if you try this I would be
interested in your results since being limited to index only old office
formats is also something we would like to overcome. Alternatively if
you can find an open source software which is capable of efficiently
extracting plain text from current office formats it should be easily
implementable into piler (basically a few lines in extract.c as far as I
can tell). For excel there is https://github.com/xevo/xls2csv and
https://github.com/nagirrab/xls2csv which claim to be cabable of
proccessing xlsx files. But I haven't looked into them yet.

Kind Regards
Martin

Am 06.05.2019 um 06:45 schrieb Katterl Christian:
>
> Hello,
>
>  
>
> Indexation of Excel files newer than Excel 2007 fails in my installation.
>
> I am using catdoc 0.95 and it tells:
>
>  
>
> This file looks like ZIP archive or Office 2007 or later file.
>
> Not supported by catdoc
>
>  
>
> The Excel-File has been created using Excel 2010.
>
>  
>
> BR, Christian
>
>
>
> *ChristianKatterl*
> Teamleader Technical IT
>
>
>
> *Asamer Baustoffe AG*
> Unterthalham Straße 2
> 4694 Ohlsdorf
> Austria
> *tel * +43 50 799 - 2511
> *mobile * +43 664 811 54 99
> *email * c.katt...@asamer.at <mailto:c.katt...@asamer.at>
> *www.abag.at* <https://www.abag.at>
>
>
> This message is confidential. It may not be disclosed to, or used by,
> anyone other than the addressee. If you receive this message by
> mistake, please advise the sender.
> Firmenbuch: Landesgericht Wels, FN: 407726y, ATU 68646334
>
>

Reply via email to