Hello Markus, Thanks for replying.
I was hoping not to have to buffer entire media files due to size. Is there a way to get the content segment as a stream? The internal buffering of a stream might be more efficient and less prone to spikes. Java is not my native tongue. I've been able to hack through other API challenges while doing this project. Googling has given me some suspicions but not a clear answer. Cheers. On Wed, Feb 15, 2017 at 3:26 PM, Markus Jelsma <[email protected]> wrote: > Hello - i don't know if media files even produce SAX events, but if they > do you can catch them in your startElement, charachters, and endElement > methods. I would start collecting element names (qName and/or attribute > values) and stuff in the character method, and append those to a > StringBuilder. > > In the endDocument method you have collected every piece of information > the ContentHandler method receives. From thereon you just call > toString().hashCode() or whatever hashing algorithm you like on the > contents accumulated in your StringBuilder. > > Regards, > Markus > > > > -----Original message----- > > From:Wshrdryr Corp <[email protected]> > > Sent: Wednesday 15th February 2017 23:22 > > To: [email protected] > > Subject: CRC ContentHandler > > > > Hello all, > > > > I need to write a Tika ContentHandler which will return a CRC and/or > hash of the non-metadata part of media files. > > > > Can anyone point me in the right direction? > > > > Im new to Tika so please forgive me if this is an obvious question. > > > > TIA for any help. >
