Hi Jukka, You can create a prototype by implementing one or two parsers. This will allow us to test it before implementing all parsers. If you need some help, please don't hesitate.
Best Regards. On 8/24/07, Jukka Zitting <[EMAIL PROTECTED]> wrote: > > Hi, > > On 8/24/07, Rida Benjelloun <[EMAIL PROTECTED]> wrote: > > I agree with your use case. > > Thanks! > > I've been thinking about this a bit more. The main thing I'm concerned > about the current Parser classes from Lius Lite is that they always > parse the entire document into an in-memory data structure. This can > easily become a scalability issue and I'd like to avoid that already > on the design level. > > Also, it seems to me that the current regexp and xpath features from > Lius would work better as a layer on top of the parser code instead of > as an integral part of it. > > As for my design proposal itself, I think I have a more workable > approach to use cases 1 (extract structured content) and 3 (extract > metadata). It looks like this: > > Extract metadata: > > InputStream stream = ...; > Metadata metadata = new Metadata(); > SomeTikaInterface parser = new SomeTikaClass(); > parser.extractMetadata(stream, metadata); > > Extract structured content (and metadata as a side-effect): > > InputStream stream = ...; > ContentHandler handler = ...; // SAX event handler > Metadata metadata = new Metadata(); > SomeTikaInterface parser = new SomeTikaClass(); > parser.extractContent(stream, handler, metadata); > > In both cases it would be possible to feed existing metadata hints > (like the file name, Content-Type header, or some other similar > information) to the parser through the metadata argument. > > WDYT? I'd like to start going forward with some code along these > lines, most likely by adapting/refactoring the Lius classes we already > have. > > BR, > > Jukka Zitting > -- --------------------------------------------------------- Rida Benjelloun Doculibre inc. [EMAIL PROTECTED] [EMAIL PROTECTED] Cel: 418-262-3222 Tel: 418-353-3390 Site Web : http://www.doculibre.com ---------------------------------------------------------
