Re: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Nick Burch

On Sun, 11 Oct 2020, Andreas Beeker wrote:
Should we have WorkbookFactory spot this case, grab the OOXML out of the 

POIFS and try to load that?

Actually I've updated the factories to handle that case - it might not work 
...
We should have an example in our test corpus - Dominik/Tim, can you provide a 
sample file for .ppt(x) / .xls(x)?


Looks like you're right, I'd missed those commits! Support is all there in 
XSSFWorkbookFactory and friends.


I've added a unit test for this based on the sample file from Apache Tika

Thanks
Nick

-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



Re: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Tim Allison
Does this meet the needs?

https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPPT_oleWorkbook.ppt

On Sun, Oct 11, 2020 at 5:09 PM Andreas Beeker  wrote:

> Hi Nick,
>
>  > Should we have WorkbookFactory spot this case, grab the OOXML out of
> the POIFS and try to load that?
>
> Actually I've updated the factories to handle that case - it might not
> work ...
> We should have an example in our test corpus - Dominik/Tim, can you
> provide a sample file for .ppt(x) / .xls(x)?
>
> Best wishes,
> Andi
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>


Re: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-11 Thread Andreas Beeker

Hi Nick,

> Should we have WorkbookFactory spot this case, grab the OOXML out of the 
POIFS and try to load that?

Actually I've updated the factories to handle that case - it might not work ...
We should have an example in our test corpus - Dominik/Tim, can you provide a 
sample file for .ppt(x) / .xls(x)?

Best wishes,
Andi


-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-09 Thread Nick Burch

Hi All

Over on Stackoverflow  
there's a user who was getting what they thought was an embedded XSLX file 
out of a PPT, but finding it was an OLE2 wrapper with CompObj and Package 
entries. The real XLSX was in the Package part. Passing the outer OLE2 
stream to WorkbookFactory didn't work


What do people think here? Should we have WorkbookFactory spot this case, 
grab the OOXML out of the POIFS and try to load that? Update HSLF to 
optionally extract the OOXML out of the OLE2? Record the gotcha in the 
docs somewhere? Something else?


Cheers
Nick

-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org