I was updating my .Net wrapper around Tika and noticed that you now support
.chm parsing as of version .10. I added a test to see if extraction of .chm
files was working. I tried .chm files generated by two different sources
with no luck. Tika returns metadata about the file: content-length,
mimetype, resourceName but no text output.

Here is how my code calls Tika:

https://github.com/KevM/tikaondotnet/blob/tika1-1/TikaOnDotnet/TextExtractor.cs

here is my test

        [Test]
>         public void should_extract_from_chm()
>         {
>             var textExtractionResult = new
> TextExtractor().Extract("docs.chm");
>
> textExtractionResult.Metadata["resourceName"].ShouldContain("docs.chm");



            //FAILS HERE

            textExtractionResult.Text.Trim().ShouldNotBeEmpty();
>         }



Shouldn't the .CHM parser be returning text extracted from the help files?

Reply via email to