[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2013-03-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610864#comment-13610864
 ] 

Commit Tag Bot commented on LUCENE-4355:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1384290

LUCENE-4355: upgrade MIGRATE.txt (also fix a bug in field+term enumeration)


 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2013-03-22 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610865#comment-13610865
 ] 

Commit Tag Bot commented on LUCENE-4355:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revisionrevision=1384286

LUCENE-4355: improve AtomicReader sugar apis


 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453070#comment-13453070
 ] 

Michael McCandless commented on LUCENE-4355:


+1, looks great!

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453137#comment-13453137
 ] 

Robert Muir commented on LUCENE-4355:
-

Thanks Mike: Ill give some time in case anyone else wants to review, but i'd 
like to commit this in a day or two.

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4355.patch, LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452292#comment-13452292
 ] 

Michael McCandless commented on LUCENE-4355:


+1, patch looks good for docFreq.

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4355.patch


 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448618#comment-13448618
 ] 

Michael McCandless commented on LUCENE-4355:


I'm OK with keeping the sugar too.  I agree the boilerplate code is sizable.

I think only taking Term, not taking Bits, keeps the docs/positions enum simple.

Should we sugar for all stats?  (eg IR.getSumTotalTermFreq(String field)).

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir

 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447649#comment-13447649
 ] 

Uwe Schindler commented on LUCENE-4355:
---

I would start with removing all these APIs except fields() from AtomicReader, 
fix all tests and then readd useful ones (useful for enduser).

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir

 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447666#comment-13447666
 ] 

Robert Muir commented on LUCENE-4355:
-

I agree, this was mainly to start discussion about what sugar apis we should 
have.

Currently its very inconsistent.

IndexReader:
* docFreq(Term) - forwards to docFreq(String, BytesRef)
* docFreq(String, BytesRef) - (abstract: this can be seen as a sugar API)

AtomicReader:
* totalTermFreq(String, BytesRef) - strange to be treated differently than 
docFreq, sugar to seekExact+totalTermFreq
* terms(String) - note that in 3.x terms() and terms(Term) are different and 
go to TermsEnum (unpositioned and positioned)
* termDocsEnum(Bits, String, BytesRef) - the Bits should be implicit in the 
reader. if you want your own bits use flex apis?
* termDocsEnum(Bits, String, BytesRef, int) - flags seems too expert
* termPositionsEnum(Bits, String, BytesRef)
* termPositionsEnum(Bits, String, BytesRef, int)

So we should also discuss whether its useful to use Term at the indexReader 
level. If we are going to have sugar
for docFreq(Term) then we should do this elsewhere too? Term is somewhat nice 
because it means users don't have
to deal with BytesRef etc.

I wonder if totalTermFreq sugar is necessary here too, if we instead make it 
easy for you to get a positioned termsenum for
a specific term (you could just call it yourself then).

We should also think about the names termDocsEnum/termPositionsEnum. in 3.x 
these were termDocs() and termPositions(),
and could take Term. 

The only thing I feel pretty strongly about is that I think passing a custom 
Bits is too expert for these sugar APIs,
as its something implicit from the Reader itself. 


 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir

 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447811#comment-13447811
 ] 

Michael McCandless commented on LUCENE-4355:


Maybe we should remove all the sugar methods...?  It's quite expert to pull a 
D/DPEnum?  But maybe stats are more commonly used?

 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir

 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4355) improve AtomicReader sugar apis

2012-09-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447984#comment-13447984
 ] 

Robert Muir commented on LUCENE-4355:
-

I don't think I agree Mike. I think we should degrade into expert territory 
rather than it being a sharp cliff.
I think we should also make migration from previous versions of lucene easier 
too.

I think these apis on IR are a good way to do that. I'm tempted to suggest:

termDocs(Term)  termPositions(Term) as the sugar postings APIs as those pretty 
much match the 3.x functionality.

I'm not sure these sugar APIs should take BytesRef, thats another head 
explosion for someone above Term which
is simpler and takes Strings.

If someone is going to be calling these on lots of things anyway they can just 
use fields()/terms()/etc.

We also have to realize its a lot of work to compute something like docFreq 
without any sugar at all,
just look at the code to docFreq:
{code}
final Fields fields = fields();
if (fields == null) {
  return 0;
}
final Terms terms = fields.terms(field);
if (terms == null) {
  return 0;
}
final TermsEnum termsEnum = terms.iterator(null);
if (termsEnum.seekExact(term, true)) {
  return termsEnum.docFreq();
} else {
  return 0;
}
{code}

Thats too much boilerplate and special-cases. the terms(String) sugar helps a 
lot here, reducing it to:
{code}
final Terms terms = ir.terms(field);
if (terms == null) {
  return 0;
}
final TermsEnum termsEnum = terms.iterator(null);
if (termsEnum.seekExact(term, true)) {
  return termsEnum.docFreq();
} else {
  return 0;
}
{code}

But thats still too much. Making a positioned termsenum more accessible could 
help with a lot
of expert use-cases like getting enums with different Bits or flags or getting 
term-level stats:

{code}
final TermsEnum te = ir.termsEnum(new Term(field, value));
if (te == null) {
  return 0;
} else {
  return te.docFreq();
}
{code}

The oddity might be that compared to 3.x, its a seekExact vs. a seekCeil. But i 
think thats ok,
after all we already backwards-broke since terms() does something totally 
different than 3.x (and I think
we should keep that, making it easy to access field-level metadata!) 

And I still think we should keep docFreq/totalTermFreq sugar!


 improve AtomicReader sugar apis
 ---

 Key: LUCENE-4355
 URL: https://issues.apache.org/jira/browse/LUCENE-4355
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir

 I thought about this after looking @ LUCENE-4353:
 AtomicReader has some sugar APIs that are over top of the flex apis (Fields, 
 Terms, ...). But these might be a little trappy/confusing compared to 3.x.
 # I dont think we need AtomicReader.termDocsEnum(Bits, ...) and 
 .termPositionsEnum(Bits, ...). I also don't think we need variants that take 
 flags here. We should simplify these to be less trappy. I think we only need 
 (String, BytesRef) here.
 # This means you need to use the flex apis for more expert usage: but we make 
 this a bit too hard since we only let you get a Terms (which you must null 
 check, then call .iterator() on, then seekExact, ...). I think it could help 
 if we balanced this out by adding some sugar like AtomicReader.termsEnum? 3.x 
 had a method that let you get a 'positioned termsenum'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org