The tika-app jar properly identifies the misnamed file so it's either a classpath or a implementation issue
I've checked the cp again and verified all jars present and accounted for and no duplicated or version-based conflicts Also ContainerAwareDetector does not seem to exist in 1.0 ... this leads me to think that that part was abstracted for ease of use and the docs are now outdated(?) But should I then be wrapping the inputstream in a TikaInputStream? I also tried the detection after creating a spingbean for the Tika class in the hope that it might wake up a hidden 'inner-self' :) On Tue, Mar 13, 2012 at 6:34 PM, Jon Gorrono <[email protected]> wrote: > Greetings... I am new to Tika and I am trying to detect the > internal doc format of an ooxml container/file > > When I call detect (InputStream, String) in a new Ticka() instance, it > appears I can fool the detector(s) by changing the file extension of a > docx file to xlsx...the detection returns > application/vnd.openxmlformats-officedocument.spreadsheetml.sheet > > Since in the code comments use the word 'hint' to describe the use of > resource names during detection, I was hoping that the hint itself was > taken lightly: advisory > > Our application accepts a very limited set of file extensions, and we > have to expect that some users will solve any conundrums about file > formats by renaming their files to meet the requirements. > > I think I've all the jars (including transient dep's) piled onto the > classpath so that the more rigorous detection can take place...I've > gone thru the list of jars in the 1.0 gettingstarted.html doc twice to > make sure they are all listed in the eclipse classpath.... I just > don't know if what I am seeing is consistent with missing jars or not. > > I done some debugging and see a very long list of Magics, but, again, > don;t know if that is core or not.... should I see a long list of > detectors as well? > > Any help offered would be appreciated > > -- > Jon Gorrono > PGP Key: 0x5434509D - > http{pgp.mit.edu:11371/pks/lookup?search=0x5434509D&op=index} > GSWoT Introducer - {GSWoT:US75 5434509D Jon P. Gorrono <jpgorrono - > www.gswot.org>} > http{middleware.ucdavis.edu} -- Jon Gorrono PGP Key: 0x5434509D - http{pgp.mit.edu:11371/pks/lookup?search=0x5434509D&op=index} GSWoT Introducer - {GSWoT:US75 5434509D Jon P. Gorrono <jpgorrono - www.gswot.org>} http{middleware.ucdavis.edu}
