Hello,

I have a small XML document that I want to parse using Tika and expect to
get SAX events for each element in the input XML file. However I get the
output only for html, head, meta, body & p.  I dont get the events for each
element in the XML file. See the Code for ContentHandler further below.
Please advise..

**** Output *****

StartDocument
StartElement html
StartElement head
StartElement meta
EndElement meta
StartElement title
EndElement title
EndElement head
StartElement body
StartElement p
EndElement p
EndElement body
EndElement html
EndDocument


***** sample.xml ******

<transformation>
  <info>
    <name>sample_normalize</name>
    <description/>
    <parameters>
       <parameter>
            <name>AS_OF_DATE</name>
            <default_value>2012-06-01</default_value>
            <description/>
        </parameter>
    </parameters>
  </info>
</transformation>


***************** XYZContentHandler ****************

public class XYZContentHandler extends DefaultHandler {

    public XYZContentHandler() {
    }

    @Override
    public void startElement(String uri, String localName, String qName,
Attributes attributes)
             throws SAXException {
        System.out.println("StartElement "+qName);
    }

    @Override
    public void endElement(String uri, String local, String name) throws
SAXException {
        System.out.println("EndElement "+name);
    }

    @Override
    public void startDocument() throws SAXException {
        System.out.println("StartDocument");
    }

    @Override
    public void endDocument() throws SAXException {
        System.out.println("EndDocument");
    }
}


****** Actual Code *******

           stream = new FileInputStream(new File(filename));
           Metadata metadata = new Metadata();
           metadata.set(Metadata.CONTENT_TYPE, "application/xml");

            XYZContentHandler handler = new XYZContentHandler();
            ParseContext context = new ParseContext();

            //Parser parser = new AutoDetectParser();
            Parser parser = new XMLParser();
            parser.parse(stream, handler, metadata, context);








On Mon, Feb 10, 2014 at 3:30 PM, Nick Burch <[email protected]> wrote:

> On Mon, 10 Feb 2014, Rupak Khurana wrote:
>
>> I am trying to parse out  JIL(Job Information Language) scripts that
>> happen
>> to have Name:Value pairs. Perhaps Tika is an overkill but wanted to use
>> its
>> parsing ability and SAX event firing to make life easier.
>>
>
> Sounds like you'll want to define / identify a suitable mimetype for
> these, add some mime magic so they get detected, then write your own parser
> that spots these name/value pairs and emmits suitable sax events for you to
> consume
>
> See http://tika.apache.org/1.4/parser_guide.html for a guide as to how to
> do all of that
>
> Nick
>

Reply via email to