Re: Specialized XML handling in Lucene

2008-03-12 Thread Eran Sevi
Indeed it seems like a problematic way. I would also have a problem searching for documents with more then one value. if the query is something simple like : "value1 AND value2" I would expect to get all xml docs with both values, but if I use the doc=element method, I won't get any result because

RE: Specialized XML handling in Lucene

2008-03-11 Thread Steven A Rowe
On 03/11/2008 at 11:48 AM, Steven A Rowe wrote: > 5 billion docs is within the range that Lucene can handle. I > think you should try doc = element and see how well it works. Sorry, Eran, I was dead wrong about this assertion. See this thread for more information:

RE: Specialized XML handling in Lucene

2008-03-11 Thread Steven A Rowe
Hi Eran, On 03/11/2008 at 12:26 PM, Eran Sevi wrote: > If I query this index structure and get results from several > xml docs, is there a better way to group results by doc id, > other then iterating on all results, get original document > and check the value of xml_doc_id field? Perhaps a Sort

Re: Specialized XML handling in Lucene

2008-03-11 Thread Eran Sevi
Thanks Steve for the quick reply, Another question regarding this solution: If I query this index structure and get results from several xml docs, is there a better way to group results by doc id, other then iterating on all results, get original document and check the value of xml_doc_id field?

RE: Specialized XML handling in Lucene

2008-03-11 Thread Steven A Rowe
Hi Eran, see my comments below inline: On 03/11/2008 at 9:23 AM, Eran Sevi wrote: > I would like to ask for suggestions of the best design for > the following scenario: > > I have a very large number of XML files (around 1M). > Each file contains several sections. Each section contains > many ele