I want to process a maven pom.xml with special code.
I added the following to the existing xml file of mimetypes:
<mime-type type="application/maven-pom">
<glob pattern="pom.xml" />
</mime-type>
I used the MimeTypesFactory to process the file.
MimeTypes mt = MimeTypesFactory.create( new FileInputStream( f
) );
Using MediaType.parse works:
MediaType pomType = MediaType.parse( "application/maven-pom;
charset=UTF-8");
System.out.println( "type: " + pomType.getType());
System.out.println( "subtype: " + pomType.getSubtype());
Results in
type: application
subtype: maven-pom
Then I created a Tika object with:
Tika tika = new Tika( mt );
File pom = new File( "~somewhere~/pom.xml");
String pomTypeString = tika.detect( pom);
System.out.println( "Tika thinks a pom is a " + pomTypeString);
String pomStreamTypeString = tika.detect( new FileInputStream(
pom ) );
System.out.println( "Tika thinks pom stream is a " +
pomStreamTypeString );
produces
Tika thinks a pom is a text/plain
Tika thinks pom stream is a text/plain
If I create a default Tika with no args, I get
Tika thinks a pom is a application/xml
Tika thinks pom stream is a text/plain
What have I missed?
Thanks.
Dave P
________________________________
This message and all attachments are PRIVATE, and contain information that is
PROPRIETARY to Intelligent Automation, Inc. You are not authorized to transmit
or otherwise disclose this message or any attachments to any third party
whatsoever without the express written consent of Intelligent Automation, Inc.
If you received this message in error or you are not willing to view this
message or any attachments on a confidential basis, please immediately delete
this email and any attachments and notify Intelligent Automation, Inc.