Hi, thanks for all the help. I really appreciate it.

I tried your test and it worked for me too, so I started examining the
maven dependency tree for my project looking for conflicts, etc. My project
has a dependency on jaxen-1.1.1 which in turn has a dependency on
xercesImpl-2.6.2. I found that excluding the xercesImpl dependency fixed my
problem with detecting XML.

For example, in my pom.xml I added:

<dependency>
    <groupId>jaxen</groupId>
    <artifactId>jaxen</artifactId>
    <version>1.1.1</version>
    <exclusions>
        <exclusion>
            <groupId>xerces</groupId>
            <artifactId>xercesImpl</artifactId>
        </exclusion>
    </exclusions>
</dependency>

I guess the services in xercesImpl were overriding the built-in (I'm using
Java 6) XML APIs?

Regards,
Wade



On Tue, Apr 17, 2012 at 3:27 PM, Nick Burch <[email protected]> wrote:

> On Tue, 17 Apr 2012, Taylor, Wade wrote:
>
>> Since I couldn't get that to work I went back to basics and tried a
>> simple XML string:
>>
>> new Tika().detect(new ByteArrayInputStream("<?xml version=\"1.0\"
>> encoding=\"UTF-8\"?><root><**child>text</child></root>".**getBytes())));
>>
>> but this gets detected as "text/plain" too and I can't figure out why it's
>> not coming back as "application/xml".
>>
>
> I've just tried this with a very simple test class:
>
> import org.apache.tika.*;
> import java.io.*;
> public class Test {
>   public static void main(String[] a) throws Exception {
>      System.out.println(
>
>        new Tika().detect(new ByteArrayInputStream(
>          "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><**
> child>text</child></root>".**getBytes()))
>      );
>   }
> }
>
> When I run it, it works fine:
>   java -classpath tika-core-1.2-SNAPSHOT.jar:. Test
>   application/xml
>
> Looks to me like you've managed to miss some key parts of Tika out when
> you added it to your application. I'm not sure which bits you missed, and
> how it hasn't blown up complaining, but it does seem to me that it's your
> environment that's stuffed...
>
> Nick
>

Reply via email to