Hi,

I need to index hierarchical data but as far as I have seen nutch/solr do
not have a concept
like hierarchie, the index seems to be flat.

Now I have a problem I would solve using some sort of hierarchy and would
like to know how you would
solve it.

Lets assume I have a set of pages I index that contain information about
persons, several persons per page.
Each person has some properties I can parse in my plugin as the information
has a certain structure.
Therefore my index contains fields like firstname, lastname, email,... each
of them as multiValued because
there are many persons on a page. As an example I say that each person has
one or more email addresses associated with it.

Now I would like to formulate queries like: return all fields of all persons
that have lastname XXX or return all Email addresses of persons XXX.
Since the fields are multiValue how can I solve this problem? I see no
possibility to associate
an entry within firstname with the corresponding lastname, as both fields
are multiValue.
Note that I have no unique id or something that I could use.

Or would the trick here be that the persons are treated as separate
documents and indexed
separately? Meaning that when parsing I split them and index them, so that
each person has a
separate entry within the index?

Any source code / plugin I could have a look at?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-hierarchical-data-schema-design-tp3052894p3052894.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to