[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057538#comment-13057538
 ] 

Michael McCandless commented on LUCENE-2454:


bq. Do you think there any efficiencies to be gained on the document retrieve 
side of things if you know that the documents commonly being retrieved are 
physically nearby

Good question!  I think OS level caching should mostly solve this?

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-22 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053142#comment-13053142
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Could that work for your use case?

Sounds like it, that's great :)
Do you think there any efficiencies to be gained on the document retrieve side 
of things if you know that the documents commonly being retrieved are 
physically nearby i.e. an app will often retrieve a parent's fields and then 
those from child docs which are required to be physically located adjacent to 
the parent's data. Would existing lower-level caching in Directory or the OS 
mean there's already a good chance of finding child data in cached blocks or 
could a change to file structures and/or doc retrieve APIs radically boost 
parent-plus-child retrieve performance?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-22 Thread Srinivas Raj (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053663#comment-13053663
 ] 

Srinivas Raj commented on LUCENE-2454:
--

This is exactly what I am looking for, hope this becomes part of core.

How to make this work with Lucene 3.2? I downloaded the zip file and I was able 
to run the test with lucene 3.0, but I would like to use the addDocuments() 
method added to Lucene 3.2. The patches seems to be specific to Lucene 4.0.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052409#comment-13052409
 ] 

Paul Elschot commented on LUCENE-2454:
--

This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even be 
closed as duplicate of that one. Which one is preferred?

On using prev/nextSetBit in a safe range, this safe range starts with the 
parent and ends with the largest known child. A variant of prevSetBit could 
take this largest known child as an argument to limit its search, and then from 
the return value one has either a new parent, or one is certain that the 
current parent is the right one. This would also limit the worst case number of 
inspected bits for the group to the group size.

With or without that variant, I think it would be good to add a remark in the 
javadocs about the possible inefficiency of the use of OpenBitSet for larger 
group sizes. When the typical group size gets a lot bigger than the number of 
bits in a long, another implementation might be faster. This remark the in 
javadocs would allow us to wait for someone to come along with bigger group 
sizes and a real performance problem here.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052436#comment-13052436
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even 
be closed as duplicate of that one. Which one is preferred?

We need to look at the likely use cases. 2454 was created to service a use case 
which I expect to be a very common pattern and I'm not sure if LUCENE-3171 
satisfies this need. Apps commonly need to return a selection of both matching 
and non-matching children along with the best parents. Why? - it's a very 
similar rationale to the way that highlighting returns a summary of text - it 
doesn't just return the matched words, it also returns surrounding text as 
useful context when displaying results to users. However, some texts can be 
very large and there's a need to limit what context is brought back.
If we apply this logic to 2454 we can see that for the top parents it is common 
to also want some non-matching children (e.g. for a resume return a person's 
employment history - not just the employments that matched the original search) 
but it is also necessary to summarize some parent's history (e.g. the 
contractor who listed a gazillion positions in his employment history needs 
summarising). A common pattern is for solutions to ask for the best 11 children 
for the best parents and display only 10 - that way the app knows that for 
certain parents there is more data available (i.e. those with 11 matches) and 
can offer a more button to retrieve the extra children for parents of 
interest. 2454 satisfies this use case as follows:
# Use a NestedDocumentQuery to get best parents with child criteria expressed 
as a must
# Use a PerParentLimitedQuery to get a selection of children per top parent 
where MUST belong to a top parent (tested using primary key) and use the child 
criteria again but this time as a SHOULD clause to relevance rank the 
selection of children returned

It's worth considering this sort of use case carefully before making any code 
decisions.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052459#comment-13052459
 ] 

Michael McCandless commented on LUCENE-2454:


{quote}
bq. It uses 2 passes if you also want to collect child docs per parent

I tend to work with distributed indexes so it involves a 2 pass op anyway - one 
to understand best parents across the multiple shards first then the 
perparentlimitedquery to ensure we only pay the retrieve costs for those 
parents that make the final cut.
{quote}

The distributed case can still be done single pass, using LUCENE-3171,
ie each shard returns the top groups and then they are merged in the
front.  This should be substantially faster than doing a 2nd pass out
to all shards.

Also, we now have TopDocs.merge/TopGroups.merge to support this use
case.

bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even 
be closed as duplicate of that one. Which one is preferred?

I think they are likely dups of one another and I agree we need to
make sure all important use cases are covered.

bq. Apps commonly need to return a selection of both matching and non-matching 
children along with the best parents.

LUCENE-3171 can do this as well, with the same approach as here, ie
doing 2 passes with two different child queries.

However, I think for both this issue and for LUCENE-3171, this means
each child doc must have the parent's PK indexed against it, right?
Ie, for that 2nd query you need some way to return all child docs
under any of the top parents, so the child query is parentID MUST be
in XX, YY, ZZ and childDoc SHOULD XYZ.

In fact, we could make this a single pass capability with LUCENE-3171
and without requireing each child doc index its parent PK, ie also
pull  sort all other non-matching children under any top parent,
because collction within each parent is done when you retrieve the
TopGroups, but this can be a later enhancement.


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052644#comment-13052644
 ] 

Michael McCandless commented on LUCENE-2454:


bq. A variant of prevSetBit could take this largest known child as an argument 
to limit its search,

I think we should not require the app to know the max number of children per 
parent?  (Ie, we should just grow buffers, etc., on demand as we collect).

I mean, if this information is easily available we could optimize for that 
case, but for some apps it's a good amount of work to record this and update it 
so I don't think it should be a required arg when creating the 
query/collectors, even though it's tempting ;)

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052648#comment-13052648
 ] 

Michael McCandless commented on LUCENE-2454:


bq. A common pattern is for solutions to ask for the best 11 children for the 
best parents and display only 10 - that way the app knows that for certain 
parents there is more data available (i.e. those with 11 matches) and can offer 
a more button to retrieve the extra children for parents of interest

With LUCENE-3171, you should be able to just ask for 10 here, and then check if 
the TopDocs.totalHits is  10 to decide whether to offer the more button.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052696#comment-13052696
 ] 

Michael McCandless commented on LUCENE-2454:


bq. I think the only thing 3171 may be missing from my original use cases then 
is that I can use multiple PerParentLimitedQueries in one query to get a limit 
of children of different types e.g. for each parent resume, max 10 results from 
employment detail children and max 10 results from education background 
children.

I think LUCENE-3171 can handle this, or something very similar: the
collector tracks all of the BlockJoinQuerys involved in the top query.

So, you'd have 1 BJQ matching employment detail child docs and
another matching education bg child docs.  The BJC collects the
top parent docs, then you can retrieve separate TopGroups for each
BJQ.

In the end you have a TopGroups for the employment detail child docs
and another TopGroups for the education bg child docs.

Could that work for your use case?


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052194#comment-13052194
 ] 

Michael McCandless commented on LUCENE-2454:


bq. Would modules/grouping meanwhile be a better place for this than 
lucene/contrib/queries?

I think modules/join is the right place?  When we factor out Solr's
generic join impl it can go there too...

I have some concerns about the current approach here (this is why I
opened LUCENE-3171):

  * prevSetBit is called for each child doc, which is an O(N^2) cost
(N = number of child docs for one parent) I think?  Admittedly,
typically N is probably small...

  * It uses 2 passes if you also want to collect child docs per
parent

  * PerParentLimitedQuery is also O(N^2) cost, both on insert of a new
child and on popping the child docs per group: I think it should
use a PQ to find the lowest child to evict per parent doc?

  * I think typically an app will want to collect the top N groups
(parent docs and their children), so it's more efficient to gather
those top N and only in the end sort the each set of children
per-parent?  (This is similar to how 2nd pass grouping collector
works).

  * PerParentLimitedQuery only supports relevance sort w/in each
parent.

  * You don't get the parent/child structure back, from
PerParentLimitedQuery (but now we have TopGroups which is a great
match for representing each parent and its children).

If you always only use PerParentLimitedQuery on the top parents from
the first pass, eg you AND/filter it against those parent docs, then
the O(N^2) cost is less severe since it'll have a small constant in
front, but since it's a Query I imagine users will use it w/o that
filter, which is bad... I think using a TopN Collector is a better match
here.


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-20 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052223#comment-13052223
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. prevSetBit is called for each child doc

You could call nextSetBit on the first child to know the safe range of child 
docs attributable to the same parent but you would be taking a gamble that this 
was worth the call i.e. there were many possible children per parent to be 
tested.

bq. It uses 2 passes if you also want to collect child docs per parent

I tend to work with distributed indexes so it involves a 2 pass op anyway - one 
to understand best parents across the multiple shards first then the 
perparentlimitedquery to ensure we only pay the retrieve costs for those 
parents that make the final cut.

bq. I think it should use a PQ to find the lowest child to evict per parent doc?

Careful object reuse would need to be factored in to avoid excessive GC - each 
parent would fill a PQ full of child-match object instances that could/should 
be reused in assessing the next parent



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-19 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051662#comment-13051662
 ] 

Paul Elschot commented on LUCENE-2454:
--

With these rewrite and createWeight methods TestNestedDocumentQuery passes:

{code}
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+Query rewrittenChildQuery = childQuery.rewrite(reader);
+return (rewrittenChildQuery == childQuery) ? this
+  : new NestedDocumentQuery(rewrittenChildQuery, parentsFilter, scoreMode);
+  }
+
+  @Override
+  public Weight createWeight(IndexSearcher searcher) throws IOException {
+return new NestedDocumentQueryWeight(childQuery.createWeight(searcher));
+  }
+
{code}

I'll continue adding the use of prevSetBit.

Would modules/grouping meanwhile be a better place for this than 
lucene/contrib/queries?




 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-19 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051673#comment-13051673
 ] 

Paul Elschot commented on LUCENE-2454:
--

The assert on the parent was an IllegalArgumentException in the previous patch.
Such and unconditional exception would probably be better than an assert, 
because when the assert is switched off a mistake in the parent filter would 
not be detected early.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LUCENE-2454.patch, 
 LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051484#comment-13051484
 ] 

Paul Elschot commented on LUCENE-2454:
--

Tried the current patch here to make use prevSetBit, but ran into a problem 
with the query weight that could be related to LUCENE-3208.

When fixing the patch here so that NestedDocumentQuery.java looks like this:
{code}
  public Weight createWeight(IndexSearcher searcher) throws IOException {
return new NestedDocumentQueryWeight(childQuery.createWeight(searcher));
  }
{code}

the TestNestedDocumentQuery from the patch here fails with an 
UnsupportedOperationException.

After adding the class name to Query.java constructing this exception the test 
fails by:

UnsupportedOperationException: org.apache.lucene.search.NumericRangeQuery

That means that probably the above fix to the patch is wrong.
Any comments on how to continue this?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051486#comment-13051486
 ] 

Michael McCandless commented on LUCENE-2454:


I suspect the NestedDocumentQuery must impl rewrite, and rewrite the 
childQuery.  I hit this on LUCENE-3171, too.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051495#comment-13051495
 ] 

Paul Elschot commented on LUCENE-2454:
--

NestedDocumentQuery already implements rewrite() by returning *this*, just as 
in 3171.

This is a more complete traceback of exception:

{noformat}
[junit] java.lang.UnsupportedOperationException: 
org.apache.lucene.search.NumericRangeQuery
[junit] at org.apache.lucene.search.Query.createWeight(Query.java:91)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.nested.NestedDocumentQuery.createWeight(NestedDocumentQuery.java:65)
[junit] at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:177)
[junit] at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:358)
[junit] at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:676)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:292)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
[junit] at 
org.apache.lucene.search.TestNestedDocumentQuery.testSimple(TestNestedDocumentQuery.java:92)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1414)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1332)
{noformat}

Could BooleanWeight be the offendor?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051502#comment-13051502
 ] 

Paul Elschot commented on LUCENE-2454:
--

One of the nocommits in the patch is about the use of an Filter for the parent 
filter.
NesteDocumentQuery uses an OpenBitSet from this Filter for next() and advance() 
just like a Filter and also as a parent filter.

So how about adding sth like this:

{code}
public abstract class ParentFilter {
  public abstract ParentDISI getParentDISI(IndexReader reader);
}

public class ParentDISI extends DocIdSetIterator {
  public int getParent(); // to be used only after next() or advance() returned 
 NO_MORE_DOCS
}

{code}

together with another constructor for NestedDocumentIterator with a 
ParentFilter argument?


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051611#comment-13051611
 ] 

Paul Elschot commented on LUCENE-2454:
--

At Query, the javadocs of both createWeight() and rewrite() start with a word 
of warning.
I'll probably need at least a few days to wrap my head around it, so in case 
anyone meanwhile can provide more help...

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045314#comment-13045314
 ] 

Paul Elschot commented on LUCENE-2454:
--

That is very nicely readable XML.

The problem might occur when a document with an optional term occurs before a 
document in the same group with a required term.
So the second question is the one for which the problem might occur.
The score value Grant's resume should then be higher than the score value for 
Sean's.
Testing only for the set of expected results is not enough for this particular 
query.

The problem might occur in another disguise when requiring both terms and then 
the set of expected results is enough to test,
but this is not as easily tested because one does not know beforehand the order 
in which the terms are going to be advance()d.
The case with an optional term is simpler to test because the optional term is 
certain to be advance()d to compute the score value after the required term 
determines that there is a match (see ReqOptSumScorer.score()), and then to be 
certain of the correct advance() on the optional term one needs to test the 
score value.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045319#comment-13045319
 ] 

Paul Elschot commented on LUCENE-2454:
--

Looking at the structure of the BooleanQuery, I would expect this to work 
correctly.  The ParentsFilter on the unfiltered scorer of required term 
(mahout) should return the docId of the parent (resume) when the unfiltered 
scorer is at the document containing the required term.


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045334#comment-13045334
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Looking at the structure of the BooleanQuery, I would expect this to work 
correctly.

I've found it to be robust so far - you just need to be clear about directing 
criteria at only one child or potentially different children. 
The main challenge in using this functionality is allowing users to articulate 
the nuances of such queries and Lucene-3133 is a holding place for this.

Under the covers using the same cached filter for parent filters certainly 
helps with performance and I typically wrap the ParentFilter tag in the XML 
queries with a CachedFilter tag to achieve this

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-07 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045561#comment-13045561
 ] 

Paul Elschot commented on LUCENE-2454:
--

So one concern that is left is performance for parent testing.
I'll open an issue for OpenBitSet.prevSetBit().

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-06 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044828#comment-13044828
 ] 

Mark Harwood commented on LUCENE-2454:
--

Below are 2 example tests searching employment resumes - both using the same 
optional and mandatory clauses but in subtly different ways.
Question 1 is who has Mahout skills and preferably used them at Lucid? while 
the other question is who has Mahout skills and preferably has been employed 
by Lucid?. The questions and the answers are different. Below is the XML test 
script I used to illustrate the data/queries used, define expected results and 
run as an executable test. 
Hopefully you can make sense of this:
{code:xml}
?xml version=1.0 encoding=UTF-8?
?xml-stylesheet type=text/xsl href=test.xsl?
Test description=NestedQuery tests
Data
Index name=ResumeIndex
Analyzers 
class=org.apache.lucene.analysis.WhitespaceAnalyzer
/Analyzers
Shard name=shard1
!--  
=== --
Document pk=1
Field name=namegrant/Field
Field name=docTyperesume/Field
/Document
!--  
=== --
Document pk=2
Field 
name=employerlucid/Field
Field 
name=docTypeemployment/Field
Field 
name=skillsjava lucene/Field
/Document
!--  
=== --
Document pk=3
Field 
name=employersomewhere else/Field
Field 
name=docTypeemployment/Field
Field 
name=skillsmahout and more mahout/Field
/Document
!--  
=== --
Document pk=4
Field name=namesean/Field
Field name=docTyperesume/Field
/Document
!--  
=== --
Document pk=5
Field 
name=employerfoo bar/Field
Field 
name=docTypeemployment/Field
Field 
name=skillsjava/Field
/Document
!--  
=== --
Document pk=6
Field 
name=employersome co/Field
Field 
name=docTypeemployment/Field
Field 
name=skillsmahout mahout and more mahout/Field
/Document
/Shard
/Index
/Data
Tests
Test description=Who knows Mahout and preferably used it 
*while employed at Lucid*?
Query
NestedQuery 
!-- testing properties of individual child employment 
docs --
   Query
  BooleanQuery
Clause occurs=must
TermsQuery 
fieldName=skillsmahout/TermsQuery
/Clause
Clause occurs=should
TermsQuery 
fieldName=employerlucid/TermsQuery
/Clause
  /BooleanQuery
   /Query
   ParentsFilter  
TermsFilter 
fieldName=docTyperesume/TermsFilter
   /ParentsFilter 
/NestedQuery
/Query
ExpectedResults why=Grant's tenure at Lucid is 

[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-05 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044608#comment-13044608
 ] 

Paul Elschot commented on LUCENE-2454:
--

I finally had some time to start taking a look at the grouping module and again 
at the patch here.
There is too much code there for me to come up with a test case soon.
So please don't wait for me to commit this.

An easy way to test this would be to have a boolean query with required term 
and an optional term,
with the optional term occurring the in a document group in a document before 
(i.e. with a lower docId than)
a document in the same group with a required term.

In case I run into this I'll open a separate issue.


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-06-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044326#comment-13044326
 ] 

Michael McCandless commented on LUCENE-2454:


OK I opened LUCENE-3171 to explore the single-pass approach.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-05-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039623#comment-13039623
 ] 

Michael McCandless commented on LUCENE-2454:


bq. I'll need to check LUCENE-3129 for equivalence with PerParentLimitQuery. 
It's certainly a central part of what I typically deploy for nested queries - 
pass 1 is usually a NestedDocumentQuery to get the best parents and pass 2 uses 
PerParentLimitQuery to get the best children for these best parents.

Hmm, so I wonder if we could do this in one pass?  Ie, like grouping,
if you indexed your docs as blocks, you can use the faster single-pass
collector; but if you didn't, you can use the more general but slower
and more-RAM-consuming two pass collector.

It seems like we should be able to do something similar with joins,
somehow... ie Solr's join impl is a start at the fully general
two-pass solution.

But I agree the join child to parent and then grouping of child
docs go hand in hand for searching...

What do you do for facet counting in these apps...?  Post-grouping
faceting also ties in here.

bq. Of course some apps can simply fetch ALL children for the top parents but 
in some cases summarising children is required

Right...

bq.  (note: this is potentially a great solution for performance issues on 
highlighting big docs e.g. entire books).

I think it'd be compelling to index book/articles with each
page/section/chapter being a new doc, and then group them under their
book/article.

bq. I haven't benchmarked nextSetBit vs the existing rewind implementation 
but I imagine it may be quicker.

I think it should be much faster -- obs.nextSetBit looks heavily
optimized, since it can operate a word at a time.  Though, if the
groups are smallish, so that nextSetBit is often maybe 2 or 3 bits
away, I'm not sure it'd be faster...

bq. Parent- followed-by-children seems more natural from a user's point of view 
however.

But is it really so bad to ask the app to put parent doc last?

I mean, the docs have to be indexed w/ the new doc block APIs in IW
anyway, which will often be eg a ListDocument, at which point
putting parent last seems a minor imposition?

Since this is an expert API I think it's OK to put [minor] impositions
on its usage if this can simplify the impl / make it faster / less
risky.  That said, I'm not yet sure on the impl (single pass query +
collector vs generic two-pass join that solr now has), so it's
probably premature to worry about this...

bq. I guess you could always keep the parent-then-child insertion order but 
flip the bitset (then cache) for query execution if that was faster.

True but this adds some hair into the impl (we must also flip coming
back from nextSetBit)...

bq. Benchmarking rewind vs nextSetbit vs flip then nextSetBit would reveal all.

True, though it'd be best to do this in the context of the actual join impl...


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-05-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039641#comment-13039641
 ] 

Paul Elschot commented on LUCENE-2454:
--

I see no test cases for required terms in a nested document.
This may be non trivial in that advance() should advance into the first doc of 
the nested doc.
For example, assume the parents p1 and p2 are the first docs in the nested 
docs, and that the query
requires a and b to be present:
{noformat}
docId
0   p1
1   a
2   b
3   p2
4   b
5   a
{noformat}
In this situation, p2 may be missed when advance() on a required scorer for b 
is given docId 5 (containing a)
as a target. It should be given target docId 3 to advance into the nested doc 
p2 containing a.

I quickly read the code here, but I could not easily determine whether this is 
done correctly or not.
Shall I add a test case here, or would it be better to open another issue after 
this one is closed, or can someone reassure me that this is not in an issue?



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-05-24 Thread Thomas Guttesen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038415#comment-13038415
 ] 

Thomas Guttesen commented on LUCENE-2454:
-

Hi.

Great feature...
I have some difficulties understanding the semantics/flow of document creation.
Do you have to add the parent and child levels in any correct sequence? Or can 
you insert all parents and then insert all child levels later.
The reason I as is that in my case I look for a one-many relation style 
insertion. I had hoped that I could add more child levels later.

Cheers

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-05-24 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038460#comment-13038460
 ] 

Mark Harwood commented on LUCENE-2454:
--

Thanks for the patch work, Mike. I'll need to check LUCENE-3129 for equivalence 
with PerParentLimitQuery. It's certainly a central part of what I typically 
deploy for nested queries - pass 1 is usually a NestedDocumentQuery to get the 
best parents and pass 2 uses PerParentLimitQuery to get the best children for 
these best parents. Of course some apps can simply fetch ALL children for the 
top parents but in some cases summarising children is required (note: this is 
potentially a great solution for performance issues on highlighting big docs 
e.g. entire books).

I haven't benchmarked nextSetBit vs the existing rewind implementation but I 
imagine it may be quicker. Parent- followed-by-children seems more natural from 
a user's point of view however. I guess you could always keep the 
parent-then-child insertion order but flip the bitset (then cache) for query 
execution if that was faster. Benchmarking rewind vs nextSetbit vs flip then 
nextSetBit would reveal all.

Thomas - maintaining a strict order of parent/child docs is important and the 
recently-committed LUCENE-3112 should help with this.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LUCENE-2454.patch, LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-05-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034702#comment-13034702
 ] 

Michael McCandless commented on LUCENE-2454:


I think this is a very important addition to Lucene, so let's get this
done!

I just opened LUCENE-3112, to add IW.add/updateDocuments, which would
atomically add Document produced by an iterator, and ensure they all
wind up in the same segment.  I think this is the only core change
necessary for this feature?  Ie, all else can be built on top of Lucene
once LUCENE-3112 is committed?


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-05-17 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034726#comment-13034726
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq.  I think this is the only core change necessary for this feature?

Yup. A same-segment indexing guarantee is all that is required.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-29 Thread RynekMedyczny.pl (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012501#comment-13012501
 ] 

RynekMedyczny.pl commented on LUCENE-2454:
--

{quote}
Code like this ends up in trunk when there is concensus so your support is 
welcome.
{quote}

Of course! How can we help you?

{quote}
While core Lucene adoption is a relatively simple technical task
{quote}

We are eagerly waiting for incorporating your work into Lucene Core!

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-23 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010110#comment-13010110
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. I have not looked this patch so this comment may be off base.

The slideshare deck gives a good overview: 
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

As a simple Lucene-focused addition I'd prefer not to explore all the possible 
implications for Solr adoption here. The affected areas in Solr are extensive 
and would include schema definitions, query syntax, facets/filter caching, 
result-fetching, DIH etc etc. Probably best discussed elsewhere.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-22 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009985#comment-13009985
 ] 

Ryan McKinley commented on LUCENE-2454:
---

bq. Solr, however does introduce a schema and much more that assumes a flat 
model.

In SOLR-1566 we could add a DocList as a field within a SolrDocument -- this 
would at least allow the output format to return a nested structure.

I have not looked this patch so this comment may be off base.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-21 Thread RynekMedyczny.pl (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009071#comment-13009071
 ] 

RynekMedyczny.pl commented on LUCENE-2454:
--

Mark, do you have any plans for including this feature into the Lucene trunk?
I think that this is a must have feature since tree structures are so common!
Thank you in advance.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-21 Thread Andrzej Bialecki

On 3/21/11 10:51 AM, Dawid Weiss wrote:

Is it just me, or was that last e-mail sent with the header:

From: RynekMedyczny.pl (JIRA)j...@apache.org


JIRA comment notifications put username in front of JIRA's own address. 
Apparently someone uses RynekMedyczny.pl as their username.




This is weird :)


I, for one, welcome RynekMedyczny.pl as a Solr user :)


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-21 Thread Dawid Weiss
Oh, in this case I also welcome RynekMedyczny.pl as a Solr user ;)

Dawid

P.S. RynekMedyczny ~= HealthCareMarket

On Mon, Mar 21, 2011 at 11:07 AM, Andrzej Bialecki a...@getopt.org wrote:
 On 3/21/11 10:51 AM, Dawid Weiss wrote:

 Is it just me, or was that last e-mail sent with the header:

 From: RynekMedyczny.pl (JIRA)j...@apache.org

 JIRA comment notifications put username in front of JIRA's own address.
 Apparently someone uses RynekMedyczny.pl as their username.


 This is weird :)

 I, for one, welcome RynekMedyczny.pl as a Solr user :)


 --
 Best regards,
 Andrzej Bialecki     
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-21 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009111#comment-13009111
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Mark, do you have any plans for including this feature into the Lucene 
trunk?

That is my intention in providing it here. I had to work hard to convince my 
employer to let me release this as open source in the interests of seeing it 
updated/tested as core Lucene APIs change - and hopefully receive some improved 
support in IndexWriter flush control. 
Unfortunately it seems not everyone shares the pain when it comes to modelling 
richer data structures and seem content with the flat model we have in Lucene 
today. Code like this ends up in trunk when there is concensus so your support 
is welcome.

While core Lucene adoption is a relatively simple technical task, Solr adoption 
feels like a much more disruptive change.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-21 Thread Jamal Natour (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009141#comment-13009141
 ] 

Jamal Natour commented on LUCENE-2454:
--

Mark,

For my project this is a must have feature that could decide the adoption of 
SOLR.  What do think is the best way to help ensure this gets incorporated into 
SOLR?

Thank you,
Jamal

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-21 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009163#comment-13009163
 ] 

Mark Harwood commented on LUCENE-2454:
--

Lucene does not dictate a schema and so using this approach to index 
design/querying is not a problem with base Lucene.

Solr, however does introduce a schema and much more that assumes a flat 
model.  In the opening chapters of the Solr 1.4 Enterprise Search Server book 
the authors take the time to discuss the modelling limitations of this flat 
model and acknowledge this as an issue. The impact of adopting nested 
documents in Solr at this stage would be very large.
There may be ways you can overcome some of your issues without requiring nested 
documents (using phrase/span queries or combining tokens from multiple fields 
in Solr)  but in my experience these are often poor alternatives if richer 
structures are important.

Cheers
Mark

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2011-02-28 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000239#comment-13000239
 ] 

Mark Harwood commented on LUCENE-2454:
--

Hi Paul,
I'm not sure I currently have an issue with merges as they just concatenate 
established segments without interleaving their documents. This operation 
should retain the order that is crucial to maintaining the 
parent/child/grandchild relationships (unless something has changed in merge 
logic which would certainly be an issue!). My main cause for concern is robust 
control over flushes so parent/child docs don't end up being separated into 
different segments at the point of arbitrary flushes.

I think your proposal here is related to a new (to me) use case where clients 
can add a single new child document and the index automagically reorganises 
to assemble all prior related documents back into a structure where they are 
grouped as contiguous documents held in the same segment? Please correct me if 
I am wrong.
Previously I have always seen this need for reorganisation as an application's 
responsibility and a single child document addition required the app to delete 
the associated parent and all old child docs, then add a new batch of documents 
representing the parent, old children plus the new child addition. Given the 
implied deletes and inserts required to maintain relationship integrity that 
seems like an operation that needs to be done under the control of Lucene's 
transaction management APIs rather than some form of special MergePolicy which 
are really intended for background efficiency tidy-ups not integrity 
maintenance.

As for the fields you outline for merging , generally speaking in applications 
using NestedDocumentQuery and PerParentLimitedQuery I have found that for 
searching purposes I already need to store: 
1) A globally unique ID as an indexed primary key field on the top-level 
container document
2) An indexed field with the same unique ID held in a different foreign key 
field on child documents
3) An indexed field indicating the document type e.g root or resume and 
level1Child or employmentRecord


I could be a little confused about your intentions - maybe should we start with 
what problem we are trying to solve before addressing how we achieve it?

Cheers
Mark

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2011-02-28 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000444#comment-13000444
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. The intention is quite simple: allow a set of documents to be used to 
provide a single score value during query searching

That's what the existing NestedDocumentQuery code attached to this issue 
already provides. As far as I am concerned the search side works fine and I 
have it installed in several live installations (along with a bug fix for 
skip that I must remember to upload here). Parent filters as you suggest 
benefit from caching and I typically use the XMLQueryParser with a 
CachedFilter tag to take care of  that (I need to upload the XMLQueryParser 
extensions for this Nested stuff too).

The new intention that I think you added in your last post was more complex 
and is related to indexing, not searching and introduced the idea that adding a 
new child doc on its own should somehow trigger some automated repair of the 
index contents. This repair would involve ensuring that related documents from 
previous adds would be reorganised such that all related documents still 
remained physically next to each other in the same segment. 
I don't think a custom choice of MergePolicy is the class to perform this 
operation - they are simply consulted as an advisor to pick which segments are 
ripe for a background merge operation conducted elsewhere. The more complex 
merge task you need to be performed here requires selective deletes of related 
docs from existing segments and addition of the same documents back into a new 
segment. This is a task I have always considered something the application code 
should do rather than relying on Lucene to second-guess what index 
reorganisation may be required. We could try make core Lucene understand and 
support parent/child relationships more fully but I'd settle for this existing 
approach with some added app-control over flushing as a first step.





 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2011-02-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000503#comment-13000503
 ] 

Paul Elschot commented on LUCENE-2454:
--

So the missing basic operation is a copy/append of a range of existing index 
docs.
After that operation, the original docs can be deleted, but that is trivial.

I'll have a look at IndexWriter for this over the coming days. Any quick hints?

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2011-02-28 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000541#comment-13000541
 ] 

Mark Harwood commented on LUCENE-2454:
--

I'm not sure the auto-repair is that trivial.
Let's say the parent/child docs are resumes and nested docs for employment 
positions (as in the attached example).
An update may not just be adding a new employment position doc but editing an 
existing one, deleting an old one etc.
Your auto-updater is going to need to do a lot of figuring out to work out 
which existing docs need copying over from earlier segments and patching in to 
the new segment with the updated parts of the resume. This gets worse if we 
start to consider multiple levels to the hierarchy.
It all feels like a lot of work for the IndexWriter to take on?


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2011-02-27 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1300#comment-1300
 ] 

Paul Elschot commented on LUCENE-2454:
--

How about an implementation for strict hierarchies that uses two fields per 
document, in the following way:

The two fields each contain a single (indexed) token that indicates the node in 
the nesting hierarchy, one field meaning that the document is a child of that 
node, and the other that the document is the representative of that node. Any 
number of levels could be allowed, but no cycles of course.
These fields are then used by a merge policy to keep the documents ordered 
postorder, that is the children immediately followed by the representative for 
each node.
Collecting scores at any node in the hierarchy could then be done by using term 
filters, one for each involved scorer, to provide the representative for the 
current doc by advancing.


For example, in index order:

userDocId nodeMemberField nodeReprField

doc1 nodeA1 .
doc2 nodeA1 .
doc3 nodeA nodeA1
doc4 nodeA2 .
doc5 nodeA2 .
doc6 nodeA nodeA2

The node representatives for scoring could then be obtained by a term filter 
for nodeA.


I think this could work for the scoring part, basically along the lines of the 
code already posted here.

Could someone with more experience in segment merge policies comment on this? 
This is quite restrictive for merging as the only freedom that is left in the 
document order is the order of the children for each node.

For example, adding a leaf document doc7 for nodeA1 could result in the 
following index order:

doc4 nodeA2 .
doc5 nodeA2 .
doc6 nodeA nodeA2
doc7 nodeA1 .
doc1 nodeA1 .
doc2 nodeA1 .
doc3 nodeA nodeA1




 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-07-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889088#action_12889088
 ] 

Michael McCandless commented on LUCENE-2454:


Maybe we should add an addDocuments call to IW?  To add more than one document, 
atomically, so that any flush must happen before or after them?

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip, 
 TestNestedDocumentQueryWithMultiSegments.java


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-07-16 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889104#action_12889104
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Maybe we should add an addDocuments call to IW? To add more than one 
document, atomically, so that any flush must happen before or after them? 

That would be nice. 
Another way of modelling this would be to introduce Document.add(Document 
childDoc) but I think that is a more fundamental and wide-reaching change.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-07-16 Thread Buddika Gajapala (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889215#action_12889215
 ] 

Buddika Gajapala commented on LUCENE-2454:
--

Mark, that was fast :)

BTW another scenario, when there are lot of data, there is a posibility of 
having parent docuemnt and matching child document in two different segments 
causing to miss some matches. I made a minor modification your approch by 
making it do a Forward-scan instead of reverse scan for parent docs and have 
the parent document inserted AFTER the child docs are inserted and in case of 
parent doc is located outside the scop of current doc, it's docid is preserved 
at the Weight Object level and nextDoc() modified to check fo that for the 
very 1st nextDoc call to new segment.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-07-15 Thread Buddika Gajapala (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1296#action_1296
 ] 

Buddika Gajapala commented on LUCENE-2454:
--

I tried this solution and works perfectly for smaller indexes with (either less 
number of Documents or Document size is small) However for larger indexes that 
span across multiple segments it only matches the the parent document acurately 
for the 1st segment. I think this is due to the way the parent docs are marked 
using a bit array for the ENTIRE index but actual traversing for matching 
criteria done by the Scorer is segment-by-segment (i.e. in nextDoc() and 
advance() methods) .  Have you considered this situation?

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-07-15 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888908#action_12888908
 ] 

Mark Harwood commented on LUCENE-2454:
--

The 2nd comment above talks about this and the need for Lucene to offer more 
control over flush policy.

bq.it only matches the the parent document acurately for the 1st segment. I 
think this is due to the way the parent docs are marked using a bit array for 
the ENTIRE index

But aren't filters held and evaluated the within the context of each sub 
reader? Are you sure the issue isn't limited to a parent/child combo that is 
split across segments? 

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-06-26 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882899#action_12882899
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Can this help in searching over multiple child/nested documents?

Yes, a typical use case is to use NestedDocumentQuery to fetch the top 10 
parents then do a second query to fetch the children using a mandatory clause 
which lists the primary keys of the selected parents (assuming the children 
have an indexed field with the parent primary key).
The PerParentLimitedQuery can be used to limit the number of child docs 
returned per parent if there are many e.g. pages in a book. Both these classes 
are in the zipped attachment to this issue.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-06-14 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878617#action_12878617
 ] 

Mark Harwood commented on LUCENE-2454:
--

Yep, I can see an app with a thousand cached filters would have a problem with 
this impl as it stands. 

Maintaining parallel indexes always feels a little flaky to me, not least 
because of the loss of  transactional integrity you can get from using a single 
index.

Is another approach to make your cached filters document-type-specific?   I.e. 
they only hold numbers in the range of zero to number-of-docs-of-this-type.
To use a cached doc ID in such a filter you would need to make use of mapping 
arrays to project the type-specific doc id numbers into global doc-id 
references and back.
Lets imagine an index with a mix of  A, B and C doc types organised as 
follows:
docIddocType
=  ===
1A
2B
3C
4A
5C
6C

The mapping arrays for docType C would look as follows
{code:title=Bar.java|borderStyle=solid}
int [ ] globalDocIdToTypeCLookUp = {-1,-1,0,-1,1,2}// sparse, sized 0- 
num docs in overall index
int [ ] typeCToGlobalDocIdLookUp = {0,1,2}  // dense, sized 0- num 
type C docs in overall index
{code}

Your cached filters would be created as follows:
{code:title=Bar.java|borderStyle=solid}
myTypeCBitset=new OpenBitSet(numberOfTypeCDocs);  //this line is hopefully 
where you save RAM!
//for all matching type C docs...
myTypeCBitSet.setBit(globalDocIdToTypeCLookUp[realDocId];
{code}

Your filters can then be used by dereferencing the child doc IDs as follows:
{code:title=Bar.java|borderStyle=solid}
int nextRealDocId=typeCToGlobalDocIdLookUp [myTypeCBitSet.getNextSetBit()];
{code}
  
Clearly the mapping arrays come at a cost of 4bytes*num docs which is non 
trivial. The sparse globalDocIdToTypeCLookUp array shown here could be avoided 
by reading TermDocs and counting at cached-Filter-create time .


 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-06-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878741#action_12878741
 ] 

David Smiley commented on LUCENE-2454:
--

That's an interesting strategy.  The size of these arrays is no big deal to me 
since there's only a couple of them.  My concern with this strategy is that I 
wonder if potentially many places in Solr would have to be become aware of this 
scheme which might make this strategy untenable to implement even though its 
theoretically sound.
  
Another nice thing about the parallel index is that the idf relevancy factor 
stays clean since it will only consider real documents.

I want to investigate these options closer ASAP since this feature you've 
implemented is something I need.  Before I saw this issue, I was going to try 
something with SpanNearQuery and the masking-field variant.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-06-13 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878434#action_12878434
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. Wow, this is absolutely awesome! 

Thanks. I've found that this certainly solves problems I previously couldn't 
address at all in standard Lucene.

bq. The leading concern I have with this implementation is the size of the 
number of documents in the index as it affects the size of filters

These filters can obviously be cached but you'll need one filter per level you 
roll up to. Assuming a 300m doc index and only rolling up matches to the root 
that should only cost 300m /8 bits per byte = 37.5 meg of RAM. Index reloads 
should avoid the cost of completely rebuilding this filter nowadays because 
filters are cached at segment level and unchanged segments will retain their 
cached filters.
Perhaps a bigger concern is any norms arrays which are allocated one BYTE (as 
opposed to one bit) per document in the index.

bq. and they don't share any fields with the parent. 

For parents with only 1 child document instance of a given type, these could be 
safely rolled up into the parent and stored in the same document.



 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-06-12 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878317#action_12878317
 ] 

David Smiley commented on LUCENE-2454:
--

Wow, this is absolutely awesome!  This is one of the best enhancement requests 
to Lucene/Solr that I've seen as it brings a real enhancement this is difficult 
/ impossible to do without.

The leading concern I have with this implementation is the size of the number 
of documents in the index as it affects the size of filters and perhaps other 
areas involving creating BitSet's.  I have a scenario in which the 
sub-documents number on average over 100 to each primary document.  These 
sub-documents are at least very small, and they don't share any fields with the 
parent.  For a large scale search situation, an index containing 3M lucene 
documents now needs to store over 300M, and thus require 100x the amount of RAM 
for filter caches as I require now.  Thoughts?

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-05-11 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866128#action_12866128
 ] 

Mark Harwood commented on LUCENE-2454:
--

Robust use of this feature is dependent on careful management of segments i.e. 
that all compound documents are held in the same segment.

Michael Busch suggested the introduction of a new FlushPolicy on IndexWriter 
to offer the required control. (see 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3c4be5a14c.6040...@gmail.com%3e
 )
Sounds sensible to me given that IndexWriter currently manages to muddle 2 
alternative policies in the one implementation and it looks like we now need a 
third.

Is this the place to start the debate on FlushPolicy ?
My guess is this change would involve :
* Deprecating/removing IndexWriter's setMaxBufferedDocs and setRAMBufferSizeMB.
* Providing a new FlushPolicy abstract class that is called with a 
BufferContext  class to hold number buffered docs + ram usage. FlushPolicy is 
asked if flushing of various structures should be triggered given the context
* Provide default implementations of FlushPolicy that are 
number-of-documents-based and RAM-based.
* Provide a special NestedDocumentFlushPolicy that can wrap any other policy 
(ram/num docs) but only triggers flushes when application code has primed it to 
say a batch of related documents is completed.

Let me know where it's best to continue the thinking on these IndexWriter 
changes.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-05-11 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866134#action_12866134
 ] 

Earwin Burrfoot commented on LUCENE-2454:
-

Both things can be combined for sure. New stream-like indexing API stuffs docs 
into IW and controls when flushes /can/ happen, while FlushPolicy decides if 
they actually /do/ happen, when they /can/.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2454) Nested Document query support

2010-05-11 Thread Mark Harwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866148#action_12866148
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. - there was a discussion on narrowing indexing API to something stream-like

Any idea where there that discussion was taking place? Happy to move 
flush-control discussions elsewhere if that is more appropriate.

 Nested Document query support
 -

 Key: LUCENE-2454
 URL: https://issues.apache.org/jira/browse/LUCENE-2454
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 3.0.2
Reporter: Mark Harwood
Assignee: Mark Harwood
Priority: Minor
 Attachments: LuceneNestedDocumentSupport-1.zip


 A facility for querying nested documents in a Lucene index as outlined in 
 http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org