I'll try to give it a shot this week for the 1.2 branch and trunk if it isn't 
too different. It shouldn't be too hard and Julien's explanation on how to 
read the configuration makes a lot of sense.


On Wednesday 08 September 2010 16:37:29 Mattmann, Chris A (388J) wrote:
> Hi Markus,
> 
> > Interesting! But can the mime extractor return more than one type for a
> > given file in Nutch?
> 
> Sure, Nutch metadata is a named Field->multi-value structure so a file (or
> piece of content) can certainly have more than 1 type.
> 
> > I see, but in that case it would be helpful if the canonical, top and sub
> > types have their own field which would also give more meaning to the
> > whole. The way it works now results in a real nasty mess when faceting on
> > the type field.
> 
> I hear ya! Though I guess it's a mess from your perspective. From mine, it
> is nice to be able to see things like:
> 
> Mime Type:
>   text (720)
>   plain (77)
>   text/plain (250)
>   xml (235)
> ...
> 
> Faceting using the primary and sub types works fine for me.
> 
> > What would be a good (configurable) improvement? Just adding the option
> > to disable the split? Or also add an option that spits out up to three
> > distinct fields?
> 
> I think that both of your suggestions are great improvements and we can
> include a patch to make each configurable.
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.mattm...@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to