Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-03 Thread Robert Muir
On Thu, Apr 2, 2015 at 6:23 PM, Adrien Grand jpou...@gmail.com wrote:
 On Fri, Apr 3, 2015 at 12:10 AM, Reitzel, Charles
 charles.reit...@tiaa-cref.org wrote:
 Unfortunately, since boost is used in hashCode() and equals() calculations, 
 changing the boost will still make the queries trappy.   You will do all 
 that work to make everything-but-boost immutable and still not fix the 
 problem.

 This is why the query cache clones queries before putting them into
 the cache and sets the boost to 1. The issue is that this clone method
 is only shallow since its purpose is to change the boost. So it works
 fine for the boost parameter but not for eg. boolean clauses.

 Again, I don't dismiss advantages of going fully immutable

I will. I do not think it is worth it.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Adrien Grand
I did not close this door, I agree this is something that should be
considered and tried to list the pros/cons that I could think about.
However I would like it to be dealt with in a different issue as it
will already be a big change to change those 4 queries. Would would be
ok to first make queries immutable up to the boost and then discuss
if/how/when we should go fully immutable with a new API to change
boosts?

On Wed, Apr 1, 2015 at 9:25 PM, david.w.smi...@gmail.com
david.w.smi...@gmail.com wrote:
 I’m +1 to going all the way (fully immutable) but the proposal stops short
 by skipping the boost.  I agree with Terry’s comments — what a shame to make
 Queries “more immutable” but not really quite immutable.  It kinda misses
 the point?  Otherwise why bother?  If this is about progress not perfection,
 then okay, but if we don’t ultimately go all the way then there isn’t the
 benefit we’re after and we’ve both changed the API and made it a bit more
 awkward to use.  I like the idea of a method like cloneWithBoost() or
 some-such.  A no-arg clone() could be final and call that one with the
 current boost.

 While we’re at it, BooleanQuery  other variable aggregates could cache the
 hashCode at construction.

 ~ David Smiley
 Freelance Apache Lucene/Solr Search Consultant/Developer
 http://www.linkedin.com/in/davidwsmiley

 On Tue, Mar 31, 2015 at 11:06 AM, Adrien Grand jpou...@gmail.com wrote:

 On Tue, Mar 31, 2015 at 4:32 PM, Terry Smith sheb...@gmail.com wrote:
  Thanks for the explanation. It seems a pity to make queries just nearly
  immutable. Do you have any interest in adding a boost parameter to
  clone()
  so they really could be immutable?

 We could have a single method, but if we do it I would rather do it in
 a different change since it would affect all queries as opposed to
 only a handful of them.

 Also there is some benefit in having clone() and setBoost() in that
 cloning and setters are things that are familiar to everyone. If we
 replace them with a new method, we would need to specify its
 semantics. (Not a blocker, just wanted to mention what the pros/cons
 are in my opinion.)

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Michael McCandless
On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand jpou...@gmail.com wrote:

 Would would be
 ok to first make queries immutable up to the boost and then discuss
 if/how/when we should go fully immutable with a new API to change
 boosts?

+1 ... progress not perfection.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Reitzel, Charles
Unfortunately, since boost is used in hashCode() and equals() calculations, 
changing the boost will still make the queries trappy.   You will do all that 
work to make everything-but-boost immutable and still not fix the problem.

You can prove it to yourself like so (this test fails!):

  public void testMapOrphan() {
MapQuery, Integer map = new HashMap();
BooleanQuery booleanAB = new BooleanQuery();
booleanAB.add(new TermQuery(new Term(contents, a)), 
BooleanClause.Occur.SHOULD);
booleanAB.add(new TermQuery(new Term(contents, b)), 
BooleanClause.Occur.SHOULD);
map.put( booleanAB, 1 );

booleanAB.setBoost( 33.3f );// Set boost after map.put()
assertTrue( map.containsKey(booleanAB) );
  }

Seems like the quick way to the coast is to write a failing test - before 
making changes.

I realize this is easier said than done.  Based on your testing that led you to 
start this discussion, can you narrow it down to a single Query class and/or 
IndexSearcher use case?   Not there will be only one case.  But, at least, it 
will be a starting point.   Once the first failing test has been written, it 
should be relatively easy to write test variations to cover the remaining 
mutuable Query classes.

With the scale of the changes you are proposing, test first seems like a 
reasonable approach.

Another compromise approach might be to sub-class the mutable Query classes 
like so:

class ImmutableBooleanQuery extends BooleanQuery {
   public void add(BooleanClause clause) {
  throw new UnsupportedOperationException( 
ImmutableBooleanQuery.add(BooleanClause) );
   }
   public void setBoost( int boost ) {
  throw new UnsupportedOperationException( 
ImmutableBooleanQuery.add(BooleanClause) );
   }
   // etc.

  public static ImmutableBooleanQuery cloneFrom(BooleanQuery original) {
   // Use field level access to by-pass mutator methods.
  }

   // Do NOT override rewrite(IndexReader)!
}

In theory, such a proxy class could be generated at runtime to force 
immutability:
https://github.com/verhas/immutator

Which could make a lot of sense in JUnit tests, if not production runtime.

An immutable Query would be cloned from the original and place on the cache 
instead.  Any attempt to modify the cache entry should fail quickly.   

To me, a less invasive approach seems like a faster and easier way to actually 
find and fix this bug.
Once that is done, then it might make sense to perform the exhaustive updates 
to prevent a relapse in the future.

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Thursday, April 02, 2015 9:46 AM
To: dev@lucene.apache.org
Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0

Boosts might not make sense to become immutable, it might make the code too 
complex. Who is to say until the other stuff is fixed first.
The downsides might outweight the upsides.

So yeah, if you want to say if anyone disagrees with what the future might 
look like i'm gonna -1 your progress, then i will bite right now.

Fixing the rest of Query to be immutable, so filter caching isn't trappy, we 
should really do that. And we have been doing it already. I remember Uwe 
suggested this approach when adding automaton and related queries a long time 
ago. It made things simpler and avoided bugs, we ultimately made as much of it 
immutable as we could.

Queries have to be well-behaved, they need a good hashcode/equals, thread 
safety, good error checking, etc. It is easier to do this when things are 
immutable. Someone today can make a patch for FooQuery that nukes setBar and 
moves it to a ctor parameter named 'bar' and chances are a lot of the time, it 
probably fixes bugs in FooQuery somehow.
Thats just what it is.

Boosts are the 'long tail'. they are simple primitive floating point values, so 
susceptible to less problems. The base class incorporates boosts into 
equals/hashcode already, which prevents the most common bugs with them. They 
are trickier because internal things like
rewrite() might shuffle them around in conjunction with clone(), to do 
optimizations. They are also only relevant when scores are needed:
so we can prevent nasty filter caching bugs as a step, by making everything 
else immutable.


On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com 
david.w.smi...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand jpou...@gmail.com wrote:

 first make queries immutable up to the boost and then discuss 
 if/how/when we should go fully immutable with a new API to change 
 boosts?


 The “if” part concerns me; I don’t mind it being a separate issue to 
 make the changes more manageable (progress not perfection, and all 
 that).  I’m all for the whole shebang.  But if others think “no” 
 then…. will it have been worthwhile to do this big change and not go all the 
 way?  I think not.
 Does anyone feel the answer is “no” to make boosts immutable? And if so why?

 If nobody comes up

Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Adrien Grand
On Fri, Apr 3, 2015 at 12:10 AM, Reitzel, Charles
charles.reit...@tiaa-cref.org wrote:
 Unfortunately, since boost is used in hashCode() and equals() calculations, 
 changing the boost will still make the queries trappy.   You will do all 
 that work to make everything-but-boost immutable and still not fix the 
 problem.

This is why the query cache clones queries before putting them into
the cache and sets the boost to 1. The issue is that this clone method
is only shallow since its purpose is to change the boost. So it works
fine for the boost parameter but not for eg. boolean clauses.

Again, I don't dismiss advantages of going fully immutable, I'm just
arguing that making all queries immutable up to the boost would
already be a good improvement like Robert described. We can still
discuss about going fully immutable in a different issue, it would
benefit from the proposed change.

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Robert Muir
Boosts might not make sense to become immutable, it might make the
code too complex. Who is to say until the other stuff is fixed first.
The downsides might outweight the upsides.

So yeah, if you want to say if anyone disagrees with what the future
might look like i'm gonna -1 your progress, then i will bite right
now.

Fixing the rest of Query to be immutable, so filter caching isn't
trappy, we should really do that. And we have been doing it already. I
remember Uwe suggested this approach when adding automaton and related
queries a long time ago. It made things simpler and avoided bugs, we
ultimately made as much of it immutable as we could.

Queries have to be well-behaved, they need a good hashcode/equals,
thread safety, good error checking, etc. It is easier to do this when
things are immutable. Someone today can make a patch for FooQuery that
nukes setBar and moves it to a ctor parameter named 'bar' and chances
are a lot of the time, it probably fixes bugs in FooQuery somehow.
Thats just what it is.

Boosts are the 'long tail'. they are simple primitive floating point
values, so susceptible to less problems. The base class incorporates
boosts into equals/hashcode already, which prevents the most common
bugs with them. They are trickier because internal things like
rewrite() might shuffle them around in conjunction with clone(), to
do optimizations. They are also only relevant when scores are needed:
so we can prevent nasty filter caching bugs as a step, by making
everything else immutable.


On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com
david.w.smi...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand jpou...@gmail.com wrote:

 first make queries immutable up to the boost and then discuss
 if/how/when we should go fully immutable with a new API to change
 boosts?


 The “if” part concerns me; I don’t mind it being a separate issue to make
 the changes more manageable (progress not perfection, and all that).  I’m
 all for the whole shebang.  But if others think “no” then…. will it have
 been worthwhile to do this big change and not go all the way?  I think not.
 Does anyone feel the answer is “no” to make boosts immutable? And if so why?

 If nobody comes up with a dissenting opinion to make boosts immutable within
 a couple days then count me as “+1” to your plans, else “-1” pending that
 discussion.

 ~ David Smiley
 Freelance Apache Lucene/Solr Search Consultant/Developer
 http://www.linkedin.com/in/davidwsmiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread david.w.smi...@gmail.com
On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand jpou...@gmail.com wrote:

 first make queries immutable up to the boost and then discuss
 if/how/when we should go fully immutable with a new API to change
 boosts?


The “if” part concerns me; I don’t mind it being a separate issue to make
the changes more manageable (progress not perfection, and all that).  I’m
all for the whole shebang.  But if others think “no” then…. will it have
been worthwhile to do this big change and not go all the way?  I think
not.  Does anyone feel the answer is “no” to make boosts immutable? And if
so why?

If nobody comes up with a dissenting opinion to make boosts immutable
within a couple days then count me as “+1” to your plans, else “-1” pending
that discussion.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Yonik Seeley
If we were designing things from scratch again, would boost really be
on Query, or would it be on BooleanClause?
Just throwing that out there... although it may make it easier to
implement immutable queries (and perhaps make more sense too), it may
also be too big of a change to be worth while.

-Yonik


On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com
david.w.smi...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand jpou...@gmail.com wrote:

 first make queries immutable up to the boost and then discuss
 if/how/when we should go fully immutable with a new API to change
 boosts?


 The “if” part concerns me; I don’t mind it being a separate issue to make
 the changes more manageable (progress not perfection, and all that).  I’m
 all for the whole shebang.  But if others think “no” then…. will it have
 been worthwhile to do this big change and not go all the way?  I think not.
 Does anyone feel the answer is “no” to make boosts immutable? And if so why?

 If nobody comes up with a dissenting opinion to make boosts immutable within
 a couple days then count me as “+1” to your plans, else “-1” pending that
 discussion.

 ~ David Smiley
 Freelance Apache Lucene/Solr Search Consultant/Developer
 http://www.linkedin.com/in/davidwsmiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread david.w.smi...@gmail.com
On Thu, Apr 2, 2015 at 9:45 AM, Robert Muir rcm...@gmail.com wrote:

 They are also only relevant when scores are needed:
 so we can prevent nasty filter caching bugs as a step, by making
 everything else immutable.


That’s a good point.

+1 to your progress Adrien!

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-01 Thread david.w.smi...@gmail.com
I’m +1 to going all the way (fully immutable) but the proposal stops short
by skipping the boost.  I agree with Terry’s comments — what a shame to
make Queries “more immutable” but not really quite immutable.  It kinda
misses the point?  Otherwise why bother?  If this is about progress not
perfection, then okay, but if we don’t ultimately go all the way then there
isn’t the benefit we’re after and we’ve both changed the API and made it a
bit more awkward to use.  I like the idea of a method like cloneWithBoost()
or some-such.  A no-arg clone() could be final and call that one with the
current boost.

While we’re at it, BooleanQuery  other variable aggregates could cache the
hashCode at construction.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Tue, Mar 31, 2015 at 11:06 AM, Adrien Grand jpou...@gmail.com wrote:

 On Tue, Mar 31, 2015 at 4:32 PM, Terry Smith sheb...@gmail.com wrote:
  Thanks for the explanation. It seems a pity to make queries just nearly
  immutable. Do you have any interest in adding a boost parameter to
 clone()
  so they really could be immutable?

 We could have a single method, but if we do it I would rather do it in
 a different change since it would affect all queries as opposed to
 only a handful of them.

 Also there is some benefit in having clone() and setBoost() in that
 cloning and setters are things that are familiar to everyone. If we
 replace them with a new method, we would need to specify its
 semantics. (Not a blocker, just wanted to mention what the pros/cons
 are in my opinion.)

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Adrien Grand
Recent changes that added automatic filter caching to IndexSearcher
uncovered some traps with our queries when it comes to using them as
cache keys. The problem comes from the fact that some of our main
queries are mutable, and modifying them while they are used as cache
keys makes the entry that they are caching invisible (because the hash
code changed too) yet still using memory.

While I think most users would be unaffected as it is rather uncommon
to modify queries after having passed them to IndexSearcher, I would
like to remove this trap by making queries immutable: everything
should be set at construction time except the boost parameter that
could still be changed with the same clone()/setBoost() mechanism as
today.

First I would like to make sure that it sounds good to everyone and
then to discuss what the API should look like. Most of our queries
happen to be immutable already (NumericRangeQuery, TermsQuery,
SpanNearQuery, etc.) but some aren't and the main exceptions are:
 - BooleanQuery,
 - DisjunctionMaxQuery,
 - PhraseQuery,
 - MultiPhraseQuery.

We could take all parameters that are set as setters and move them to
constructor arguments. For the above queries, this would mean (using
varargs for ease of use):

  BooleanQuery(boolean disableCoord, int minShouldMatch,
BooleanClause... clauses)
  DisjunctionMaxQuery(float tieBreakMul, Query... clauses)

For PhraseQuery and MultiPhraseQuery, the closest to what we have
today would require adding new classes to wrap terms and positions
together, for instance:

class TermAndPosition {
  public final BytesRef term;
  public final int position;
}

so that eg. PhraseQuery would look like:

  PhraseQuery(int slop, String field, TermAndPosition... terms)

MultiPhraseQuery would be the same with several terms at the same position.

Comments/ideas/concerns are highly welcome.

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Robert Muir
Same with BooleanQuery. the go-to ctor should just take 'clauses'

On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 +1

 For PhraseQuery we could also have a common-case ctor that just takes
 the terms (and assumes sequential positions)?

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
 Recent changes that added automatic filter caching to IndexSearcher
 uncovered some traps with our queries when it comes to using them as
 cache keys. The problem comes from the fact that some of our main
 queries are mutable, and modifying them while they are used as cache
 keys makes the entry that they are caching invisible (because the hash
 code changed too) yet still using memory.

 While I think most users would be unaffected as it is rather uncommon
 to modify queries after having passed them to IndexSearcher, I would
 like to remove this trap by making queries immutable: everything
 should be set at construction time except the boost parameter that
 could still be changed with the same clone()/setBoost() mechanism as
 today.

 First I would like to make sure that it sounds good to everyone and
 then to discuss what the API should look like. Most of our queries
 happen to be immutable already (NumericRangeQuery, TermsQuery,
 SpanNearQuery, etc.) but some aren't and the main exceptions are:
  - BooleanQuery,
  - DisjunctionMaxQuery,
  - PhraseQuery,
  - MultiPhraseQuery.

 We could take all parameters that are set as setters and move them to
 constructor arguments. For the above queries, this would mean (using
 varargs for ease of use):

   BooleanQuery(boolean disableCoord, int minShouldMatch,
 BooleanClause... clauses)
   DisjunctionMaxQuery(float tieBreakMul, Query... clauses)

 For PhraseQuery and MultiPhraseQuery, the closest to what we have
 today would require adding new classes to wrap terms and positions
 together, for instance:

 class TermAndPosition {
   public final BytesRef term;
   public final int position;
 }

 so that eg. PhraseQuery would look like:

   PhraseQuery(int slop, String field, TermAndPosition... terms)

 MultiPhraseQuery would be the same with several terms at the same position.

 Comments/ideas/concerns are highly welcome.

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Michael McCandless
+1

For PhraseQuery we could also have a common-case ctor that just takes
the terms (and assumes sequential positions)?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
 Recent changes that added automatic filter caching to IndexSearcher
 uncovered some traps with our queries when it comes to using them as
 cache keys. The problem comes from the fact that some of our main
 queries are mutable, and modifying them while they are used as cache
 keys makes the entry that they are caching invisible (because the hash
 code changed too) yet still using memory.

 While I think most users would be unaffected as it is rather uncommon
 to modify queries after having passed them to IndexSearcher, I would
 like to remove this trap by making queries immutable: everything
 should be set at construction time except the boost parameter that
 could still be changed with the same clone()/setBoost() mechanism as
 today.

 First I would like to make sure that it sounds good to everyone and
 then to discuss what the API should look like. Most of our queries
 happen to be immutable already (NumericRangeQuery, TermsQuery,
 SpanNearQuery, etc.) but some aren't and the main exceptions are:
  - BooleanQuery,
  - DisjunctionMaxQuery,
  - PhraseQuery,
  - MultiPhraseQuery.

 We could take all parameters that are set as setters and move them to
 constructor arguments. For the above queries, this would mean (using
 varargs for ease of use):

   BooleanQuery(boolean disableCoord, int minShouldMatch,
 BooleanClause... clauses)
   DisjunctionMaxQuery(float tieBreakMul, Query... clauses)

 For PhraseQuery and MultiPhraseQuery, the closest to what we have
 today would require adding new classes to wrap terms and positions
 together, for instance:

 class TermAndPosition {
   public final BytesRef term;
   public final int position;
 }

 so that eg. PhraseQuery would look like:

   PhraseQuery(int slop, String field, TermAndPosition... terms)

 MultiPhraseQuery would be the same with several terms at the same position.

 Comments/ideas/concerns are highly welcome.

 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Adrien Grand
Hi Charles,

On Tue, Mar 31, 2015 at 4:12 PM, Reitzel, Charles
charles.reit...@tiaa-cref.org wrote:
 Am I missing something?   Across the project, I’m seeing over 1,000 
 references to BooleanQuery.add().   Already, this seems like a pretty major 
 refactoring.  And I haven’t checked the other types of queries: 
 DisjunctionMax, Phrase, and MultiPhrase.   At that scale, bugs will be 
 introduced.

 I’m not disagreeing with the concept.  At all.   It’s part of the Collections 
 contract that anything used in hashCode() and equals() be kept immutable.  
 Just wondering if the cost is worth the principle this time?

The majority of call sites are in test folders. This does not make the
change easier but it decreases the chances to introduce an actual bug.
Also, the queries that we need to modify are those that are best
tested so I'm quite confident that this change will not be a bug nest.

However I totally agree that the change it huge, this is why I asked
for opinions on the list before doing it actually doing it.

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Adrien Grand
On Tue, Mar 31, 2015 at 4:32 PM, Terry Smith sheb...@gmail.com wrote:
 Thanks for the explanation. It seems a pity to make queries just nearly
 immutable. Do you have any interest in adding a boost parameter to clone()
 so they really could be immutable?

We could have a single method, but if we do it I would rather do it in
a different change since it would affect all queries as opposed to
only a handful of them.

Also there is some benefit in having clone() and setBoost() in that
cloning and setters are things that are familiar to everyone. If we
replace them with a new method, we would need to specify its
semantics. (Not a blocker, just wanted to mention what the pros/cons
are in my opinion.)

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Reitzel, Charles
Am I missing something?   Across the project, I’m seeing over 1,000 references 
to BooleanQuery.add().   Already, this seems like a pretty major refactoring.  
And I haven’t checked the other types of queries: DisjunctionMax, Phrase, and 
MultiPhrase.   At that scale, bugs will be introduced.

I’m not disagreeing with the concept.  At all.   It’s part of the Collections 
contract that anything used in hashCode() and equals() be kept immutable.  Just 
wondering if the cost is worth the principle this time?

In the spirit of discussion, an alternate approach might be to:
  a. Locate the places in the code where a query is taken from the cache and 
modified after the fact.
  b. Remove the query object before modifying and placing back on the cache.

Easier said than done, I realize.  Note, changing the constructors and removing 
modifiers would force all of these changes anyway.  It's just that they would 
be lost in a forest of other minor modifications.So, even if folks are ok 
with the larger scale changes, it might make sense to start with the 
problematic places first and then move on to the bulk of syntax changes.

Please ignore this if I am missing something here.


From: Terry Smith [mailto:sheb...@gmail.com] 
Sent: Tuesday, March 31, 2015 9:38 AM
To: dev@lucene.apache.org
Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0

Adrien,

I missed the reason that boost is going to stay mutable. Is this to support 
query rewriting?

--Terry


On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:
Same with BooleanQuery. the go-to ctor should just take 'clauses'

On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 +1

 For PhraseQuery we could also have a common-case ctor that just takes
 the terms (and assumes sequential positions)?

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
 Recent changes that added automatic filter caching to IndexSearcher
 uncovered some traps with our queries when it comes to using them as
 cache keys. The problem comes from the fact that some of our main
 queries are mutable, and modifying them while they are used as cache
 keys makes the entry that they are caching invisible (because the hash
 code changed too) yet still using memory.

 While I think most users would be unaffected as it is rather uncommon
 to modify queries after having passed them to IndexSearcher, I would
 like to remove this trap by making queries immutable: everything
 should be set at construction time except the boost parameter that
 could still be changed with the same clone()/setBoost() mechanism as
 today.

 First I would like to make sure that it sounds good to everyone and
 then to discuss what the API should look like. Most of our queries
 happen to be immutable already (NumericRangeQuery, TermsQuery,
 SpanNearQuery, etc.) but some aren't and the main exceptions are:
  - BooleanQuery,
  - DisjunctionMaxQuery,
  - PhraseQuery,
  - MultiPhraseQuery.

 We could take all parameters that are set as setters and move them to
 constructor arguments. For the above queries, this would mean (using
 varargs for ease of use):

   BooleanQuery(boolean disableCoord, int minShouldMatch,
     BooleanClause... clauses)
   DisjunctionMaxQuery(float tieBreakMul, Query... clauses)

 For PhraseQuery and MultiPhraseQuery, the closest to what we have
 today would require adding new classes to wrap terms and positions
 together, for instance:

 class TermAndPosition {
   public final BytesRef term;
   public final int position;
 }

 so that eg. PhraseQuery would look like:

   PhraseQuery(int slop, String field, TermAndPosition... terms)

 MultiPhraseQuery would be the same with several terms at the same position.

 Comments/ideas/concerns are highly welcome.

 --
 Adrien

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Adrien Grand
Hi Terry,

Indeed this is for query rewriting. For instance if you have a boolean
query with a boost of 5 that wraps a single MUST clause with a term
query, then we rewrite to this to the inner term query and update its
boost using clone() and setBoost() in order to not modify in-place a
user-modified query.

On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith sheb...@gmail.com wrote:
 Adrien,

 I missed the reason that boost is going to stay mutable. Is this to support
 query rewriting?

 --Terry


 On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:

 Same with BooleanQuery. the go-to ctor should just take 'clauses'

 On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  +1
 
  For PhraseQuery we could also have a common-case ctor that just takes
  the terms (and assumes sequential positions)?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
  Recent changes that added automatic filter caching to IndexSearcher
  uncovered some traps with our queries when it comes to using them as
  cache keys. The problem comes from the fact that some of our main
  queries are mutable, and modifying them while they are used as cache
  keys makes the entry that they are caching invisible (because the hash
  code changed too) yet still using memory.
 
  While I think most users would be unaffected as it is rather uncommon
  to modify queries after having passed them to IndexSearcher, I would
  like to remove this trap by making queries immutable: everything
  should be set at construction time except the boost parameter that
  could still be changed with the same clone()/setBoost() mechanism as
  today.
 
  First I would like to make sure that it sounds good to everyone and
  then to discuss what the API should look like. Most of our queries
  happen to be immutable already (NumericRangeQuery, TermsQuery,
  SpanNearQuery, etc.) but some aren't and the main exceptions are:
   - BooleanQuery,
   - DisjunctionMaxQuery,
   - PhraseQuery,
   - MultiPhraseQuery.
 
  We could take all parameters that are set as setters and move them to
  constructor arguments. For the above queries, this would mean (using
  varargs for ease of use):
 
BooleanQuery(boolean disableCoord, int minShouldMatch,
  BooleanClause... clauses)
DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
 
  For PhraseQuery and MultiPhraseQuery, the closest to what we have
  today would require adding new classes to wrap terms and positions
  together, for instance:
 
  class TermAndPosition {
public final BytesRef term;
public final int position;
  }
 
  so that eg. PhraseQuery would look like:
 
PhraseQuery(int slop, String field, TermAndPosition... terms)
 
  MultiPhraseQuery would be the same with several terms at the same
  position.
 
  Comments/ideas/concerns are highly welcome.
 
  --
  Adrien
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Terry Smith
Adrien,

Thanks for the explanation. It seems a pity to make queries just nearly
immutable. Do you have any interest in adding a boost parameter to clone()
so they really could be immutable?

--Terry


On Tue, Mar 31, 2015 at 9:44 AM, Adrien Grand jpou...@gmail.com wrote:

 Hi Terry,

 Indeed this is for query rewriting. For instance if you have a boolean
 query with a boost of 5 that wraps a single MUST clause with a term
 query, then we rewrite to this to the inner term query and update its
 boost using clone() and setBoost() in order to not modify in-place a
 user-modified query.

 On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith sheb...@gmail.com wrote:
  Adrien,
 
  I missed the reason that boost is going to stay mutable. Is this to
 support
  query rewriting?
 
  --Terry
 
 
  On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:
 
  Same with BooleanQuery. the go-to ctor should just take 'clauses'
 
  On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
   +1
  
   For PhraseQuery we could also have a common-case ctor that just takes
   the terms (and assumes sequential positions)?
  
   Mike McCandless
  
   http://blog.mikemccandless.com
  
  
   On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com
 wrote:
   Recent changes that added automatic filter caching to IndexSearcher
   uncovered some traps with our queries when it comes to using them as
   cache keys. The problem comes from the fact that some of our main
   queries are mutable, and modifying them while they are used as cache
   keys makes the entry that they are caching invisible (because the
 hash
   code changed too) yet still using memory.
  
   While I think most users would be unaffected as it is rather uncommon
   to modify queries after having passed them to IndexSearcher, I would
   like to remove this trap by making queries immutable: everything
   should be set at construction time except the boost parameter that
   could still be changed with the same clone()/setBoost() mechanism as
   today.
  
   First I would like to make sure that it sounds good to everyone and
   then to discuss what the API should look like. Most of our queries
   happen to be immutable already (NumericRangeQuery, TermsQuery,
   SpanNearQuery, etc.) but some aren't and the main exceptions are:
- BooleanQuery,
- DisjunctionMaxQuery,
- PhraseQuery,
- MultiPhraseQuery.
  
   We could take all parameters that are set as setters and move them to
   constructor arguments. For the above queries, this would mean (using
   varargs for ease of use):
  
 BooleanQuery(boolean disableCoord, int minShouldMatch,
   BooleanClause... clauses)
 DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
  
   For PhraseQuery and MultiPhraseQuery, the closest to what we have
   today would require adding new classes to wrap terms and positions
   together, for instance:
  
   class TermAndPosition {
 public final BytesRef term;
 public final int position;
   }
  
   so that eg. PhraseQuery would look like:
  
 PhraseQuery(int slop, String field, TermAndPosition... terms)
  
   MultiPhraseQuery would be the same with several terms at the same
   position.
  
   Comments/ideas/concerns are highly welcome.
  
   --
   Adrien
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 



 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Mark Miller
+1. Makes a lot of sense in general I think. Keeps them nicely thread safe
as well, and as they are commonly used for keys in caches and such, that's
a nice intrinsic property.

- Mark

On Tue, Mar 31, 2015 at 7:24 AM Robert Muir rcm...@gmail.com wrote:

 Same with BooleanQuery. the go-to ctor should just take 'clauses'

 On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  +1
 
  For PhraseQuery we could also have a common-case ctor that just takes
  the terms (and assumes sequential positions)?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
  Recent changes that added automatic filter caching to IndexSearcher
  uncovered some traps with our queries when it comes to using them as
  cache keys. The problem comes from the fact that some of our main
  queries are mutable, and modifying them while they are used as cache
  keys makes the entry that they are caching invisible (because the hash
  code changed too) yet still using memory.
 
  While I think most users would be unaffected as it is rather uncommon
  to modify queries after having passed them to IndexSearcher, I would
  like to remove this trap by making queries immutable: everything
  should be set at construction time except the boost parameter that
  could still be changed with the same clone()/setBoost() mechanism as
  today.
 
  First I would like to make sure that it sounds good to everyone and
  then to discuss what the API should look like. Most of our queries
  happen to be immutable already (NumericRangeQuery, TermsQuery,
  SpanNearQuery, etc.) but some aren't and the main exceptions are:
   - BooleanQuery,
   - DisjunctionMaxQuery,
   - PhraseQuery,
   - MultiPhraseQuery.
 
  We could take all parameters that are set as setters and move them to
  constructor arguments. For the above queries, this would mean (using
  varargs for ease of use):
 
BooleanQuery(boolean disableCoord, int minShouldMatch,
  BooleanClause... clauses)
DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
 
  For PhraseQuery and MultiPhraseQuery, the closest to what we have
  today would require adding new classes to wrap terms and positions
  together, for instance:
 
  class TermAndPosition {
public final BytesRef term;
public final int position;
  }
 
  so that eg. PhraseQuery would look like:
 
PhraseQuery(int slop, String field, TermAndPosition... terms)
 
  MultiPhraseQuery would be the same with several terms at the same
 position.
 
  Comments/ideas/concerns are highly welcome.
 
  --
  Adrien
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Terry Smith
Adrien,

I missed the reason that boost is going to stay mutable. Is this to support
query rewriting?

--Terry


On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:

 Same with BooleanQuery. the go-to ctor should just take 'clauses'

 On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  +1
 
  For PhraseQuery we could also have a common-case ctor that just takes
  the terms (and assumes sequential positions)?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
  Recent changes that added automatic filter caching to IndexSearcher
  uncovered some traps with our queries when it comes to using them as
  cache keys. The problem comes from the fact that some of our main
  queries are mutable, and modifying them while they are used as cache
  keys makes the entry that they are caching invisible (because the hash
  code changed too) yet still using memory.
 
  While I think most users would be unaffected as it is rather uncommon
  to modify queries after having passed them to IndexSearcher, I would
  like to remove this trap by making queries immutable: everything
  should be set at construction time except the boost parameter that
  could still be changed with the same clone()/setBoost() mechanism as
  today.
 
  First I would like to make sure that it sounds good to everyone and
  then to discuss what the API should look like. Most of our queries
  happen to be immutable already (NumericRangeQuery, TermsQuery,
  SpanNearQuery, etc.) but some aren't and the main exceptions are:
   - BooleanQuery,
   - DisjunctionMaxQuery,
   - PhraseQuery,
   - MultiPhraseQuery.
 
  We could take all parameters that are set as setters and move them to
  constructor arguments. For the above queries, this would mean (using
  varargs for ease of use):
 
BooleanQuery(boolean disableCoord, int minShouldMatch,
  BooleanClause... clauses)
DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
 
  For PhraseQuery and MultiPhraseQuery, the closest to what we have
  today would require adding new classes to wrap terms and positions
  together, for instance:
 
  class TermAndPosition {
public final BytesRef term;
public final int position;
  }
 
  so that eg. PhraseQuery would look like:
 
PhraseQuery(int slop, String field, TermAndPosition... terms)
 
  MultiPhraseQuery would be the same with several terms at the same
 position.
 
  Comments/ideas/concerns are highly welcome.
 
  --
  Adrien
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org