I was updating my .Net wrapper around Tika and noticed that you now support .chm parsing as of version .10. I added a test to see if extraction of .chm files was working. I tried .chm files generated by two different sources with no luck. Tika returns metadata about the file: content-length, mimetype, resourceName but no text output.
Here is how my code calls Tika: https://github.com/KevM/tikaondotnet/blob/tika1-1/TikaOnDotnet/TextExtractor.cs here is my test [Test] > public void should_extract_from_chm() > { > var textExtractionResult = new > TextExtractor().Extract("docs.chm"); > > textExtractionResult.Metadata["resourceName"].ShouldContain("docs.chm"); //FAILS HERE textExtractionResult.Text.Trim().ShouldNotBeEmpty(); > } Shouldn't the .CHM parser be returning text extracted from the help files?
