Ignore the previous request for ContentHandler examples. I looked at the
Interface spec but I guess what is more important is to know if Tika can
really parse "Name" and fire an event containing its "Value"? If the entire
text content of the input file is outputted under the <body> tag of XHTML,
then it does not serve my purpose.


On Mon, Feb 10, 2014 at 2:35 PM, Rupak Khurana <[email protected]>wrote:

> I am trying to parse out  JIL(Job Information Language) scripts that
> happen to have Name:Value pairs. Perhaps Tika is an overkill but wanted to
> use its parsing ability and SAX event firing to make life easier. Could you
> please point me to some examples of custom ContentHandler if you happen to
> know.
>
> thanks
>
>
> On Mon, Feb 10, 2014 at 2:27 PM, Ken Krugler 
> <[email protected]>wrote:
>
>>
>> On Feb 10, 2014, at 11:22am, Rupak Khurana <[email protected]>
>> wrote:
>>
>> Hello
>>
>> I have a plain text file that has several "Name : Value"  pairs that I
>> want to parse out. Note this is not a XML or HTML file. Hoping that the
>> startElement SAX event is fired whenever any "Name" element is encountered.
>> Is there any ContentHandler that can do this? Currently with
>> BodyContentHandler, I just get <body> All Name:Value pairs </body>. I am
>> not sure it ElementMappingContentHandler can do the trick and how to use
>> it? Any pointers please.
>>
>>
>> If it's just plain text, then why do you want to deal with SAX events? Is
>> it that the file is too big?
>>
>> In any case, I imagine you could get the desired behavior by implementing
>> your own ContentHandler.
>>
>> -- Ken
>>
>>
>>    --------------------------
>>     Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>>
>>
>>
>>
>>
>>
>

Reply via email to