subject:"Thoughts on Parser design and dependencies"

Re: Thoughts on Parser design and dependencies

2006-08-19 Thread Andrzej Bialecki

Jukka Zitting wrote: Hi, On 8/19/06, Sami Siren [EMAIL PROTECTED] wrote: So far nutch has been build to deal mainly with text type documents. There's however need also to deal with non textual object eg. images, movies, sound which will provide content only in form of metadata (ok, perhaps

Re: Thoughts on Parser design and dependencies

2006-08-18 Thread Andrzej Bialecki

Jukka Zitting wrote: The Parser interface is also bound to the ideas of fetching content from the network and indexing it using a standard content model through the Content and Parse dependencies. For the Tika project I'd like to look for ways to generalize this, as neither of these ideas apply

Re: Thoughts on Parser design and dependencies

2006-08-18 Thread Sami Siren

Andrzej Bialecki wrote: Jukka Zitting wrote: The Parser interface is also bound to the ideas of fetching content from the network and indexing it using a standard content model through the Content and Parse dependencies. For the Tika project I'd like to look for ways to generalize this, as

Re: Thoughts on Parser design and dependencies

2006-08-18 Thread Andrzej Bialecki

Sami Siren wrote: Andrzej Bialecki wrote: Jukka Zitting wrote: The Parser interface is also bound to the ideas of fetching content from the network and indexing it using a standard content model through the Content and Parse dependencies. For the Tika project I'd like to look for ways to

Re: Thoughts on Parser design and dependencies

2006-08-18 Thread Sami Siren

Andrzej Bialecki wrote: Sami Siren wrote: Andrzej Bialecki wrote: Jukka Zitting wrote: The Parser interface is also bound to the ideas of fetching content from the network and indexing it using a standard content model through the Content and Parse dependencies. For the Tika project I'd like

Re: Thoughts on Parser design and dependencies

2006-08-18 Thread Andrzej Bialecki

Sami Siren wrote: Original motivation for this was http headers and meta tags, which can have multiple values. Another case is the language identification, where the same key may have multiple values, coming from different sources. Additionally, MapWritable supports any Writable, which is

Thoughts on Parser design and dependencies

2006-08-16 Thread Jukka Zitting

Hi, I have some questions about the dependencies of the Parser interface, especially from the perspective of generalizing it to the potential Tika project. The current dependencies are: * Configurable - depends on the Hadoop configuration system * Pluggable - depends on the Nutch plugin

Re: Thoughts on Parser design and dependencies

Re: Thoughts on Parser design and dependencies

Re: Thoughts on Parser design and dependencies

Re: Thoughts on Parser design and dependencies

Re: Thoughts on Parser design and dependencies

Re: Thoughts on Parser design and dependencies

Thoughts on Parser design and dependencies

7 matches

Site Navigation

Mail list logo

Footer information