Re: Hierarchical document

2003-10-21 Thread Peter Keegan
One way to implement hierarchical documents is through the use of
predefined phrases. Consider the 2 hierarchies:

1. Kids_and_Teens/Computers/Software/Games
2. Computers/Software/Freeware

When indexing a document belonging to (1), add these terms in consecutive
order (autoincrement=1): dir:Top dir:Kids_and_Teens dir:Computers
dir:Software dir:Games dir:Bottom

For documents belonging to (2), add: dir:Top dir:Computers dir:Software
dir:Bottom

The terms dir:Top and dir:Bottom can be used to anchor a query
to a specific portion of the hierachy.

Thus, a query containing the phrase: dir:Computers dir:Software would
match documents in both (1) and (2) (and perhaps others), but a query for:
dir:Top dir:Kids_and_Teens dir:Computers dir:Software would target only
'Computer/Software' documents from the 'Kids_and_Teens' top level directory.
(The QueryPhrase 'slop factor' would be set to 0).

Peter

- Original Message - 
From: Tatu Saloranta [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, October 20, 2003 8:24 PM
Subject: Re: Hierarchical document


 On Monday 20 October 2003 10:31, Erik Hatcher wrote:
  On Monday, October 20, 2003, at 11:06  AM, Tom Howe wrote:
  There is not a more lucene way to do this - its really up to you to
  be creative with this.  I'm sure there are folks that have implemented
  something along these lines on top of Lucene.  In fact, I have a
  particular interest in doing so at some point myself.  This is very
  similar to the object-relational issues surrounding relational
  databases - turning a pretty flat structure into an object graph.
  There are several ideas that could be explored by playing tricks with
  fields, such as giving them a hierarchical naming structure and
  querying at the level you like (think Field.Keyword and PrefixQuery,
  for example), and using a field to indicate type and narrowing queries
  to documents of the desired type.
 
  I'm interested to see what others have done in this area, or what ideas
  emerge about how to accomplish this.

 I'm planning to do something similar. In my case problem is bit simpler;
 documents have associated products, and products form a hierarchy.
 Searches should be able to match not only direct matches (searching
 product, article associated with product), but also indirect ones via
 membership (product member of a product group, matching group).
 Product hierarchy also has variable depth.

 To do searches using non-leaf hierarchy items (groups), all actual product
 items/groups associated with docs are expanded to full ids when
 indexing (ie. they contain path from root, up to and including node,
 each node component having its own unique id).
 Thus, when searching for an intermediate node (product grouping),
 match occurs since that node id is part of path to products that are in
 the group (either directly or as members of sub-groups).

 Since no such path is stored (directly) in database, this also allows me
to do
 queries that would be impossible to do in database (I could add similar
 path/full id fields for search purposes of course). Thus, Lucene index is
 optimized for searching purposes, and database structure for editing
 and retrieval of data.

 Another thing to keep in mind is that at least for metadata it may make
sense
 to use specialized analyzer, one that allows tokenizing using specific ids
 to store ids as separate tokens; instead of using some standard plain text
 analyzer. This way it is possible to separate ids from textual words (by
 using prefixes, for example, @1253 or #13945); this allows for
accurate
 matching based on identity of associated metadata selections.

 -+ Tatu +-


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Hierarchical document

2003-10-20 Thread Tom Howe
Hi, 
I have a very hierarchical document structure where each level of the
hierarchy contains indexable information.  It looks like this:  

Study - 
Section - 
DataFile - 
Variable.  

The goal is to create a situation where a user can execute a search at
any level and the search would include all of the information below it
in the hierarchy and retrieve the proper aggregated document.  In other
words, someone could search for a Study using word that appears in
several DataFiles in the study and a single study document could be
returned.  At the same time, someone could search for a DataFile and
each of the matching DataFile documents would be returned.  Is there a
good way to do this other than using multiple indexes? 

Thanks,
Tom


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Hierarchical document

2003-10-20 Thread Erik Hatcher
On Monday, October 20, 2003, at 11:06  AM, Tom Howe wrote:
contain Section and Study information and then, if a user wants a set 
of
Study documents, just aggregate them after the search by hand or is
there a more lucene way of doing this?  I'm trying to avoid storing
too much redundant information to implement this kind of hierarchical
structure, but that may not be possible.  I hope I'm being somewhat
clear with my question.
There is not a more lucene way to do this - its really up to you to 
be creative with this.  I'm sure there are folks that have implemented 
something along these lines on top of Lucene.  In fact, I have a 
particular interest in doing so at some point myself.  This is very 
similar to the object-relational issues surrounding relational 
databases - turning a pretty flat structure into an object graph.  
There are several ideas that could be explored by playing tricks with 
fields, such as giving them a hierarchical naming structure and 
querying at the level you like (think Field.Keyword and PrefixQuery, 
for example), and using a field to indicate type and narrowing queries 
to documents of the desired type.

I'm interested to see what others have done in this area, or what ideas 
emerge about how to accomplish this.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Hierarchical document

2003-10-20 Thread Tatu Saloranta
On Monday 20 October 2003 16:41, Erik Hatcher wrote:
 One more thought related to this subject - once a nice scheme for
 representing hierarchies within a Lucene index emerges, having XPath as
 a query language would rock!  Has anyone implemented O/R or XPath-like
 query expressions on top of Lucene?

Not me... but at some point I think I briefly mentioned that someone with 
extra time might want to do a very simple JDBC driver to be used with
Lucene. Obviously it would be very minimal for queries (and might need
to invent new SQL operators for some searches), but it could also expose
metadata about index. Should be an interesting exercise at least. :-)
Plus, if done properly, tools like DBVis could be used for simple Lucene
testing as well.

If so, who knows; perhaps that would make it even easier to do prototype
implementations of Lucene replacing home-grown SQL-bound search
functionalities of apps.

Most of all above would just be a nice little hack, though. :-)

-+ Tatu +-



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]