Hy roland

i get same issue when i parse the Microsoft office doc.
i have poi-3.6 version jar and tika 0.6 file in my project.

we get the following exception
Caused by: java.lang.ArrayIndexOutOfBoundsException: 221433
at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45)
at org.apache.poi.hwpf.model.ListLevel.<init>(ListLevel.java:120)
at org.apache.poi.hwpf.model.ListFormatOverrideLevel.<init>
(ListFormatOverrideLevel.java:48)
at org.apache.poi.hwpf.model.ListTables.<init>(ListTables.java:88)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:267)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:87)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)

i had tried with opening the tika-parsers-0.6.jar in winrar and find the pom.xml
from the jar and edit the pom.xml as per ur suggestion
edited pom.xml snippets of the file is below
<dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi</artifactId>
      <version>3.6</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-scratchpad</artifactId>
      <version>3.6</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>3.6</version>
      <exclusions>
        <exclusion>
          <groupId>stax</groupId>
          <artifactId>stax-api</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

can u tell me exactly  how would u get the solution?

can u help me to solve the said issue?

how to modify the POEM in order to use POI 3.7 with TIKA?

Thanks
Yatin Baraiya



Reply via email to