Thanks Nick. I had set AutoDetectParser in the ParseContext and that was causing text extraction of embedded objects recursively. Once I removed this I got text extract of just the parent file.
Regards, Shiv On Fri, Nov 29, 2013 at 4:16 PM, Nick Burch <[email protected]> wrote: > On Fri, 29 Nov 2013, Shiv Kenche wrote: > >> I have a Parent doc file with many attachments(children) into it. I need >> to extract text content of Parent doc file but do not need text extract of >> its children. >> > > Tika does not recurse into embedded documents by default. To enable > recursion, you need to set a Parser object onto the ParseContext, to be > used to handle the child objects. Without one, Tika will process the outer > (parent) document only > > Nick >
