Tika's default handling of xml is to scrape out the text and ignore the 
entities and attributes, IIRC.  So, if that's the behavior you want, and your 
XBRLs are well-formed XML, you'll be good to go.

If they're non-standard XML or if you want the node names and attributes, you 
may have to add your own parser, which should be straightforward[1].

The best way to see what Tika will do is to download tika-app[2], start up the 
GUI and drop in a file to see what you get.

[1] https://tika.apache.org/1.17/parser_guide.html
[2] http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.17.jar

From: Johnson, Jaya [mailto:[email protected]]
Sent: Tuesday, March 13, 2018 5:06 PM
To: [email protected]
Subject: XBRL documents.

Can Tika parse XBRL documents it's a variation of an XML document.

Thanks.
-----------------------------------------
Moody's monitors email communications through its networks for regulatory 
compliance purposes and to protect its customers, employees and business and 
where allowed to do so by applicable law. The information contained in this 
e-mail message, and any attachment thereto, is confidential and may not be 
disclosed without our express permission. If you are not the intended recipient 
or an employee or agent responsible for delivering this message to the intended 
recipient, you are hereby notified that you have received this message in error 
and that any review, dissemination, distribution or copying of this message, or 
any attachment thereto, in whole or in part, is strictly prohibited. If you 
have received this message in error, please immediately notify us by telephone, 
fax or e-mail and delete the message and all of its attachments. Every effort 
is made to keep our network free from viruses. You should, however, review this 
e-mail message, as well as any attachment thereto, for viruses. We take no 
responsibility and have no liability for any computer virus which may be 
transferred via this e-mail message.
-----------------------------------------

Reply via email to