[jira] [Updated] (SOLR-5746) solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema

2014-07-11 Thread Maciej Zasada (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Zasada updated SOLR-5746:


Attachment: SOLR-5746.patch

Hi [~hossman],

I've attached updated patch file:
* used framework's randomisation wherever it made sense to me;
* added exception messages assertions;
* added reporting of multiple unexpended config options ({{DEBUG}} level), as 
well as exception message containing list of unknown parameters (e.g. 
{code}Unrecognised 3 config parameter(s) in solr.xml file: [foo, bar, baz]{code}
 
{{ant clean test}} shows that there's no regression.

Cheers,
Maciej

 solr.xml parsing of str vs int vs bool is brittle; fails silently; 
 expects odd type for shareSchema   
 --

 Key: SOLR-5746
 URL: https://issues.apache.org/jira/browse/SOLR-5746
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5, 4.6
Reporter: Hoss Man
 Attachments: SOLR-5746.patch, SOLR-5746.patch


 A comment in the ref guide got me looking at ConfigSolrXml.java and noticing 
 that the parsing of solr.xml options here is very brittle and confusing.  In 
 particular:
 * if a boolean option foo is expected along the lines of {{bool 
 name=footrue/bool}} it will silently ignore {{str 
 name=footrue/str}}
 * likewise for an int option {{int name=bar32/int}} vs {{str 
 name=bar32/str}}
 ... this is inconsistent with the way solrconfig.xml is parsed.  In 
 solrconfig.xml, the xml nodes are parsed into a NamedList, and the above 
 options will work in either form, but an invalid value such as {{bool 
 name=fooNOT A BOOLEAN/bool}} will generate an error earlier (when 
 parsing config) then {{str name=fooNOT A BOOLEAN/str}} (attempt to 
 parse the string as a bool the first time the config value is needed)
 In addition, i notice this really confusing line...
 {code}
 propMap.put(CfgProp.SOLR_SHARESCHEMA, 
 doSub(solr/str[@name='shareSchema']));
 {code}
 shareSchema is used internally as a boolean option, but as written the 
 parsing code will ignore it unless the user explicitly configures it as a 
 {{str/}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6238) Specialized test case for leader recovery scenario

2014-07-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058566#comment-14058566
 ] 

Shalin Shekhar Mangar commented on SOLR-6238:
-

More tests are always welcome.

The leader initiated recovery doesn't actually cover this particular fail so 
I'm surprised that it doesn't reproduce after 4.7. Please help me understand 
the sequence of operations:

{quote}
Leader - Lost Connection with ZK
Replica - Became leader
{quote}
If the leader lost its connection with ZK then it should've rejoined election 
on reconnect. If so, why was an add request on this (old) leader successful?

I'll take a look at your patch.

 Specialized test case for leader recovery scenario
 --

 Key: SOLR-6238
 URL: https://issues.apache.org/jira/browse/SOLR-6238
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.10

 Attachments: SOLR-6238.patch


 A scenario which could happen at least before the addition of 
 LeaderInitiatedRecoveryThread I think. Also this can happen only if one is 
 using a non cloud aware client ( which might be quite a few users ) given 
 that we have only SolrJ
 Events are in chronological order -
 Leader - Lost Connection with ZK
 Replica - Became leader
 Leader - add document is successful. Forwards it to the replica
 Replica - add document is unsuccessful as it is the leader and the request 
 says it is coming from a leader
 So as of now the the Replica(new leader) won't have the doc but the 
 leader(old leader) will have the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6238) Specialized test case for leader recovery scenario

2014-07-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-6238:
---

Assignee: Shalin Shekhar Mangar

 Specialized test case for leader recovery scenario
 --

 Key: SOLR-6238
 URL: https://issues.apache.org/jira/browse/SOLR-6238
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.10

 Attachments: SOLR-6238.patch


 A scenario which could happen at least before the addition of 
 LeaderInitiatedRecoveryThread I think. Also this can happen only if one is 
 using a non cloud aware client ( which might be quite a few users ) given 
 that we have only SolrJ
 Events are in chronological order -
 Leader - Lost Connection with ZK
 Replica - Became leader
 Leader - add document is successful. Forwards it to the replica
 Replica - add document is unsuccessful as it is the leader and the request 
 says it is coming from a leader
 So as of now the the Replica(new leader) won't have the doc but the 
 leader(old leader) will have the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6234) Scoring modes for query time join

2014-07-11 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-6234:
---

Description: 
it adds {{scorejoin}} query parser which calls Lucene's JoinUtil underneath. It 
supports:
- {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil), 
also 
- supports {{b=100}} param to pass {{Query.setBoost()}}.
So far
- it always passes {{multipleValuesPerDocument=true}}
- it doesn't cover cross core join case, I just can't find the multicore 
testcase in Solr test, I appreciate if you point me on one. 
- I attach standalone plugin project, let me know if somebody interested, I 
convert it into the proper Solr codebase patch. Also please mention the 
blockers!

Note: the development of this patch was sponsored by an anonymous contributor 
and approved for release under Apache License.

  was:
it adds {{scorejoin}} query parser which calls Lucene's JoinUtil underneath. It 
supports:
- {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil), 
also 
- supports {{b=100}} param to pass {{Query.setBoost()}}.
So far
- it always passes {{multipleValuesPerDocument=true}}
- it doesn't cover cross core join case, I just can't find the multicore 
testcase in Solr test, I appreciate if you point me on one. 
- I attach standalone plugin project, let me know if somebody interested, I 
convert it into the proper Solr codebase patch. Also please mention the 
blockers!


 Scoring modes for query time join 
 --

 Key: SOLR-6234
 URL: https://issues.apache.org/jira/browse/SOLR-6234
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Affects Versions: 5.0, 4.10
Reporter: Mikhail Khludnev
 Attachments: lucene-join-solr-query-parser-0.0.2.zip


 it adds {{scorejoin}} query parser which calls Lucene's JoinUtil underneath. 
 It supports:
 - {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil), 
 also 
 - supports {{b=100}} param to pass {{Query.setBoost()}}.
 So far
 - it always passes {{multipleValuesPerDocument=true}}
 - it doesn't cover cross core join case, I just can't find the multicore 
 testcase in Solr test, I appreciate if you point me on one. 
 - I attach standalone plugin project, let me know if somebody interested, I 
 convert it into the proper Solr codebase patch. Also please mention the 
 blockers!
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery

2014-07-11 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5815:
--

 Summary: Add TermAutomatonQuery, for proximity matching that 
generalizes MultiPhraseQuery/SpanNearQuery
 Key: LUCENE-5815
 URL: https://issues.apache.org/jira/browse/LUCENE-5815
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10


I created a new query, called TermAutomatonQuery, that's a proximity
query to generalize MultiPhraseQuery/SpanNearQuery: it lets you
construct an arbitrary automaton whose transitions are whole terms, and
then find all documents that the automaton matches.  This is different
from a normal automaton whose transitions are usually
bytes/characters within a term/s.

So, if the automaton has just 1 transition, it's just an expensive
TermQuery.  If you have two transitions in sequence, it's a phrase
query of two terms.  You can express synonyms by using transitions
that overlap one another but the automaton doesn't have to be a
sausage (as MultiPhraseQuery requires) i.e. it respects posLength
(at query time).

It also allows any transitions, to match any term, so you can do
sloppy matching and span-like queries, e.g. find lucene and python
with up to 3 other terms in between.

I also added a class to convert a TokenStream directly to the
automaton for this query, preserving posLength.  (Of course, the index
can't store posLength, so the matching won't be fully correct if any
indexed tokens has posLength != 1).  But if you do query-time-only
synonyms then the matching should finally be correct.

I haven't tested performance but I suspect it's quite slowish ... its
cost is O(sum-totalTF) of all terms used in the automaton.  There
are some optimizations we could do, e.g. detecting that some terms in
the automaton can be upgraded to MUST (right now they are all
effectively SHOULD).

I'm not sure how it should assign scores (punted on that for now), but
the matching seems to be working.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery

2014-07-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5815:
---

Attachment: LUCENE-5815.patch

Patch, work in progress, lots of nocommits...

 Add TermAutomatonQuery, for proximity matching that generalizes 
 MultiPhraseQuery/SpanNearQuery
 --

 Key: LUCENE-5815
 URL: https://issues.apache.org/jira/browse/LUCENE-5815
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10

 Attachments: LUCENE-5815.patch


 I created a new query, called TermAutomatonQuery, that's a proximity
 query to generalize MultiPhraseQuery/SpanNearQuery: it lets you
 construct an arbitrary automaton whose transitions are whole terms, and
 then find all documents that the automaton matches.  This is different
 from a normal automaton whose transitions are usually
 bytes/characters within a term/s.
 So, if the automaton has just 1 transition, it's just an expensive
 TermQuery.  If you have two transitions in sequence, it's a phrase
 query of two terms.  You can express synonyms by using transitions
 that overlap one another but the automaton doesn't have to be a
 sausage (as MultiPhraseQuery requires) i.e. it respects posLength
 (at query time).
 It also allows any transitions, to match any term, so you can do
 sloppy matching and span-like queries, e.g. find lucene and python
 with up to 3 other terms in between.
 I also added a class to convert a TokenStream directly to the
 automaton for this query, preserving posLength.  (Of course, the index
 can't store posLength, so the matching won't be fully correct if any
 indexed tokens has posLength != 1).  But if you do query-time-only
 synonyms then the matching should finally be correct.
 I haven't tested performance but I suspect it's quite slowish ... its
 cost is O(sum-totalTF) of all terms used in the automaton.  There
 are some optimizations we could do, e.g. detecting that some terms in
 the automaton can be upgraded to MUST (right now they are all
 effectively SHOULD).
 I'm not sure how it should assign scores (punted on that for now), but
 the matching seems to be working.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6239) HttpSolrServer: connection still allocated

2014-07-11 Thread JIRA
Sergio Fernández created SOLR-6239:
--

 Summary: HttpSolrServer: connection still allocated
 Key: SOLR-6239
 URL: https://issues.apache.org/jira/browse/SOLR-6239
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: Sergio Fernández
Priority: Minor


In scenarios where concurrency is aggressive, this exception could easily 
appear:

{quote}
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid 
use of BasicClientConnManager: connection still allocated.
Make sure to release the connection before allocating another one.
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
 ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
 ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
 ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
 ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) 
~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) 
~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
{quote}

I wonder if there is any solution for it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2014-07-11 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058716#comment-14058716
 ] 

Jack Krupansky commented on SOLR-247:
-

The earlier commentary clearly lays out that the primary concern is that it 
would be a performance nightmare, but... that does depend on your particular 
use case.

Personally, I would say to go forward with adding this feature, but with a 
clear documentation caveat that this feature should be use with great care 
since it is likely to be extremely memory and performance intensive and more of 
a development testing tool than a production feature, although it could have 
value when wildcard patterns are crafted with care for a very limited number of 
fields.


 Allow facet.field=* to facet on all fields (without knowing what they are)
 --

 Key: SOLR-247
 URL: https://issues.apache.org/jira/browse/SOLR-247
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
  Labels: beginners, newdev
 Attachments: SOLR-247-FacetAllFields.patch, SOLR-247.patch, 
 SOLR-247.patch, SOLR-247.patch


 I don't know if this is a good idea to include -- it is potentially a bad 
 idea to use it, but that can be ok.
 This came out of trying to use faceting for the LukeRequestHandler top term 
 collecting.
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6239) HttpSolrServer: connection still allocated

2014-07-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058724#comment-14058724
 ] 

Sergio Fernández commented on SOLR-6239:


The recommended way, by Httpcomponents 4.1, is to close connection and release 
any underlying resources:

{code}EntityUtils.consume(HttpEntity){code}

But not sure how it fits with the current code...

 HttpSolrServer: connection still allocated
 --

 Key: SOLR-6239
 URL: https://issues.apache.org/jira/browse/SOLR-6239
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: Sergio Fernández
Priority: Minor

 In scenarios where concurrency is aggressive, this exception could easily 
 appear:
 {quote}
 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid 
 use of BasicClientConnManager: connection still allocated.
 Make sure to release the connection before allocating another one.
   at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
  ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
   at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
  ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
   at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
  ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
   at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
  ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) 
 ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) 
 ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04]
 {quote}
 I wonder if there is any solution for it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery

2014-07-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058730#comment-14058730
 ] 

Robert Muir commented on LUCENE-5815:
-

{quote}
 (Of course, the index
can't store posLength, so the matching won't be fully correct if any
indexed tokens has posLength != 1).
{quote}

Why not? Cant we just have a tokenfilter that encodes positionLengthAttribute 
as a vInt payload (will always be one byte, unless you are crazy)? The custom 
scorer here could optionally support it.

Personally: not sure its worth it. I think its better to fix QP to parse 
correctly in common cases like word-delimiter etc (first: those tokenfilters 
must be fixed!).

And I'm a little confused if this approach is faster than rewrite() to booleans 
of phrase queries?

 Add TermAutomatonQuery, for proximity matching that generalizes 
 MultiPhraseQuery/SpanNearQuery
 --

 Key: LUCENE-5815
 URL: https://issues.apache.org/jira/browse/LUCENE-5815
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10

 Attachments: LUCENE-5815.patch


 I created a new query, called TermAutomatonQuery, that's a proximity
 query to generalize MultiPhraseQuery/SpanNearQuery: it lets you
 construct an arbitrary automaton whose transitions are whole terms, and
 then find all documents that the automaton matches.  This is different
 from a normal automaton whose transitions are usually
 bytes/characters within a term/s.
 So, if the automaton has just 1 transition, it's just an expensive
 TermQuery.  If you have two transitions in sequence, it's a phrase
 query of two terms.  You can express synonyms by using transitions
 that overlap one another but the automaton doesn't have to be a
 sausage (as MultiPhraseQuery requires) i.e. it respects posLength
 (at query time).
 It also allows any transitions, to match any term, so you can do
 sloppy matching and span-like queries, e.g. find lucene and python
 with up to 3 other terms in between.
 I also added a class to convert a TokenStream directly to the
 automaton for this query, preserving posLength.  (Of course, the index
 can't store posLength, so the matching won't be fully correct if any
 indexed tokens has posLength != 1).  But if you do query-time-only
 synonyms then the matching should finally be correct.
 I haven't tested performance but I suspect it's quite slowish ... its
 cost is O(sum-totalTF) of all terms used in the automaton.  There
 are some optimizations we could do, e.g. detecting that some terms in
 the automaton can be upgraded to MUST (right now they are all
 effectively SHOULD).
 I'm not sure how it should assign scores (punted on that for now), but
 the matching seems to be working.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery

2014-07-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058749#comment-14058749
 ] 

Michael McCandless commented on LUCENE-5815:


bq. Cant we just have a tokenfilter that encodes positionLengthAttribute as a 
vInt payload (will always be one byte, unless you are crazy)? The custom scorer 
here could optionally support it.

Yes, I think that would work!  And would be pretty simple to build... and the 
changes to this scorer would be simple: right now it just hardwires that a 
given token goes from pos to pos+1, but with this vInt in the payload it would 
decode that and use it instead of +1.

bq. I think its better to fix QP to parse correctly in common cases like 
word-delimiter etc (first: those tokenfilters must be fixed!).

Right, QP needs to use posLength to build the correct queries... this new query 
just makes it easy since any arbitrary graph TokenStream can be directly 
translated into the equivalent query.

bq. And I'm a little confused if this approach is faster than rewrite() to 
booleans of phrase queries?

We can only rewrite to BQ of PQ if the automaton doesn't use the ANY token 
transition, and if it's finite, right?  Or maybe we could somehow take ANY and 
map it to slop on the phrase queries?

But in those restricted cases, it's probably faster, I guess depending on what 
the automaton looks like.  Ie, you could make a biggish automaton that rewrites 
to many many phrase queries.  I'll add a TODO to maybe do this rewriting for 
this query ...

 Add TermAutomatonQuery, for proximity matching that generalizes 
 MultiPhraseQuery/SpanNearQuery
 --

 Key: LUCENE-5815
 URL: https://issues.apache.org/jira/browse/LUCENE-5815
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10

 Attachments: LUCENE-5815.patch


 I created a new query, called TermAutomatonQuery, that's a proximity
 query to generalize MultiPhraseQuery/SpanNearQuery: it lets you
 construct an arbitrary automaton whose transitions are whole terms, and
 then find all documents that the automaton matches.  This is different
 from a normal automaton whose transitions are usually
 bytes/characters within a term/s.
 So, if the automaton has just 1 transition, it's just an expensive
 TermQuery.  If you have two transitions in sequence, it's a phrase
 query of two terms.  You can express synonyms by using transitions
 that overlap one another but the automaton doesn't have to be a
 sausage (as MultiPhraseQuery requires) i.e. it respects posLength
 (at query time).
 It also allows any transitions, to match any term, so you can do
 sloppy matching and span-like queries, e.g. find lucene and python
 with up to 3 other terms in between.
 I also added a class to convert a TokenStream directly to the
 automaton for this query, preserving posLength.  (Of course, the index
 can't store posLength, so the matching won't be fully correct if any
 indexed tokens has posLength != 1).  But if you do query-time-only
 synonyms then the matching should finally be correct.
 I haven't tested performance but I suspect it's quite slowish ... its
 cost is O(sum-totalTF) of all terms used in the automaton.  There
 are some optimizations we could do, e.g. detecting that some terms in
 the automaton can be upgraded to MUST (right now they are all
 effectively SHOULD).
 I'm not sure how it should assign scores (punted on that for now), but
 the matching seems to be working.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery

2014-07-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058754#comment-14058754
 ] 

Robert Muir commented on LUCENE-5815:
-

{quote}
We can only rewrite to BQ of PQ if the automaton doesn't use the ANY token 
transition, and if it's finite, right? Or maybe we could somehow take ANY and 
map it to slop on the phrase queries?
{quote}

Hmm, ok i see what you are getting at.

I guess I was immediately only considering the case where the automaton is 
actually coming from query analysis chain: it would always be finite and so on 
in this case... right?

 Add TermAutomatonQuery, for proximity matching that generalizes 
 MultiPhraseQuery/SpanNearQuery
 --

 Key: LUCENE-5815
 URL: https://issues.apache.org/jira/browse/LUCENE-5815
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10

 Attachments: LUCENE-5815.patch


 I created a new query, called TermAutomatonQuery, that's a proximity
 query to generalize MultiPhraseQuery/SpanNearQuery: it lets you
 construct an arbitrary automaton whose transitions are whole terms, and
 then find all documents that the automaton matches.  This is different
 from a normal automaton whose transitions are usually
 bytes/characters within a term/s.
 So, if the automaton has just 1 transition, it's just an expensive
 TermQuery.  If you have two transitions in sequence, it's a phrase
 query of two terms.  You can express synonyms by using transitions
 that overlap one another but the automaton doesn't have to be a
 sausage (as MultiPhraseQuery requires) i.e. it respects posLength
 (at query time).
 It also allows any transitions, to match any term, so you can do
 sloppy matching and span-like queries, e.g. find lucene and python
 with up to 3 other terms in between.
 I also added a class to convert a TokenStream directly to the
 automaton for this query, preserving posLength.  (Of course, the index
 can't store posLength, so the matching won't be fully correct if any
 indexed tokens has posLength != 1).  But if you do query-time-only
 synonyms then the matching should finally be correct.
 I haven't tested performance but I suspect it's quite slowish ... its
 cost is O(sum-totalTF) of all terms used in the automaton.  There
 are some optimizations we could do, e.g. detecting that some terms in
 the automaton can be upgraded to MUST (right now they are all
 effectively SHOULD).
 I'm not sure how it should assign scores (punted on that for now), but
 the matching seems to be working.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery

2014-07-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058755#comment-14058755
 ] 

Michael McCandless commented on LUCENE-5815:


bq. I guess I was immediately only considering the case where the automaton is 
actually coming from query analysis chain: it would always be finite and so on 
in this case... right?

Ahh, yes it would!  We could always use BQ(PQ) in that case, and typically 
the Automaton would be smallish, unless app makes crazy synonyms, etc.?

 Add TermAutomatonQuery, for proximity matching that generalizes 
 MultiPhraseQuery/SpanNearQuery
 --

 Key: LUCENE-5815
 URL: https://issues.apache.org/jira/browse/LUCENE-5815
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.10

 Attachments: LUCENE-5815.patch


 I created a new query, called TermAutomatonQuery, that's a proximity
 query to generalize MultiPhraseQuery/SpanNearQuery: it lets you
 construct an arbitrary automaton whose transitions are whole terms, and
 then find all documents that the automaton matches.  This is different
 from a normal automaton whose transitions are usually
 bytes/characters within a term/s.
 So, if the automaton has just 1 transition, it's just an expensive
 TermQuery.  If you have two transitions in sequence, it's a phrase
 query of two terms.  You can express synonyms by using transitions
 that overlap one another but the automaton doesn't have to be a
 sausage (as MultiPhraseQuery requires) i.e. it respects posLength
 (at query time).
 It also allows any transitions, to match any term, so you can do
 sloppy matching and span-like queries, e.g. find lucene and python
 with up to 3 other terms in between.
 I also added a class to convert a TokenStream directly to the
 automaton for this query, preserving posLength.  (Of course, the index
 can't store posLength, so the matching won't be fully correct if any
 indexed tokens has posLength != 1).  But if you do query-time-only
 synonyms then the matching should finally be correct.
 I haven't tested performance but I suspect it's quite slowish ... its
 cost is O(sum-totalTF) of all terms used in the automaton.  There
 are some optimizations we could do, e.g. detecting that some terms in
 the automaton can be upgraded to MUST (right now they are all
 effectively SHOULD).
 I'm not sure how it should assign scores (punted on that for now), but
 the matching seems to be working.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5816) ToParentBlockJoin deothogonalization

2014-07-11 Thread Nikolay Khitrin (JIRA)
Nikolay Khitrin created LUCENE-5816:
---

 Summary: ToParentBlockJoin deothogonalization
 Key: LUCENE-5816
 URL: https://issues.apache.org/jira/browse/LUCENE-5816
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 4.9
Reporter: Nikolay Khitrin


For now ToParentBlockJoinQuery accepts only child
documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
undefined behavior and garbage in results, unfortunatelly it also affects
TPBJQ.advance().  After pointed patch IllegalStateException is
thrown when this occurs.

So we must always take parent-child relations into account while writing
queries. At most of time it is necessary when writing a query, but sometimes,
filters can be independent of data model (for example, ACL filters:
+TPBJQ +allowed:user).

TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
or returned from childScorer.advance(). This change doesn't break anything:
results will be absolutely the same for parent-child orthogonal result.

In few words: Document matching parent filter should be parent of itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization

2014-07-11 Thread Nikolay Khitrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Khitrin updated LUCENE-5816:


Summary: ToParentBlockJoinQuery deothogonalization  (was: ToParentBlockJoin 
deothogonalization)

 ToParentBlockJoinQuery deothogonalization
 -

 Key: LUCENE-5816
 URL: https://issues.apache.org/jira/browse/LUCENE-5816
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 4.9
Reporter: Nikolay Khitrin

 For now ToParentBlockJoinQuery accepts only child
 documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
 undefined behavior and garbage in results, unfortunatelly it also affects
 TPBJQ.advance().  After pointed patch IllegalStateException is
 thrown when this occurs.
 So we must always take parent-child relations into account while writing
 queries. At most of time it is necessary when writing a query, but sometimes,
 filters can be independent of data model (for example, ACL filters:
 +TPBJQ +allowed:user).
 TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
 or returned from childScorer.advance(). This change doesn't break anything:
 results will be absolutely the same for parent-child orthogonal result.
 In few words: Document matching parent filter should be parent of itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization

2014-07-11 Thread Nikolay Khitrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Khitrin updated LUCENE-5816:


Attachment: LUCENE-5816.patch

 ToParentBlockJoinQuery deothogonalization
 -

 Key: LUCENE-5816
 URL: https://issues.apache.org/jira/browse/LUCENE-5816
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 4.9
Reporter: Nikolay Khitrin
 Attachments: LUCENE-5816.patch


 For now ToParentBlockJoinQuery accepts only child
 documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
 undefined behavior and garbage in results, unfortunatelly it also affects
 TPBJQ.advance().  After pointed patch IllegalStateException is
 thrown when this occurs.
 So we must always take parent-child relations into account while writing
 queries. At most of time it is necessary when writing a query, but sometimes,
 filters can be independent of data model (for example, ACL filters:
 +TPBJQ +allowed:user).
 TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
 or returned from childScorer.advance(). This change doesn't break anything:
 results will be absolutely the same for parent-child orthogonal result.
 In few words: Document matching parent filter should be parent of itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization

2014-07-11 Thread Nikolay Khitrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Khitrin updated LUCENE-5816:


Description: 
For now ToParentBlockJoinQuery accepts only child
documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
undefined behavior and garbage in results, unfortunatelly it also affects
TPBJQ.advance().  After pointed patch IllegalStateException is
thrown when this occurs.

So we must always take parent-child relations into account while writing
queries. At most of time it is necessary when writing a query, but sometimes,
filters can be independent of data model (for example, ACL filters:
+TPBJQ +allowed:user).

TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
or returned from childScorer.advance(). This change doesn't break anything:
results will be absolutely the same for parent-child orthogonal queries.

In few words: Document matching parent filter should be parent of itself.

  was:
For now ToParentBlockJoinQuery accepts only child
documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
undefined behavior and garbage in results, unfortunatelly it also affects
TPBJQ.advance().  After pointed patch IllegalStateException is
thrown when this occurs.

So we must always take parent-child relations into account while writing
queries. At most of time it is necessary when writing a query, but sometimes,
filters can be independent of data model (for example, ACL filters:
+TPBJQ +allowed:user).

TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
or returned from childScorer.advance(). This change doesn't break anything:
results will be absolutely the same for parent-child orthogonal result.

In few words: Document matching parent filter should be parent of itself.


 ToParentBlockJoinQuery deothogonalization
 -

 Key: LUCENE-5816
 URL: https://issues.apache.org/jira/browse/LUCENE-5816
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 4.9
Reporter: Nikolay Khitrin
 Attachments: LUCENE-5816.patch


 For now ToParentBlockJoinQuery accepts only child
 documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
 undefined behavior and garbage in results, unfortunatelly it also affects
 TPBJQ.advance().  After pointed patch IllegalStateException is
 thrown when this occurs.
 So we must always take parent-child relations into account while writing
 queries. At most of time it is necessary when writing a query, but sometimes,
 filters can be independent of data model (for example, ACL filters:
 +TPBJQ +allowed:user).
 TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
 or returned from childScorer.advance(). This change doesn't break anything:
 results will be absolutely the same for parent-child orthogonal queries.
 In few words: Document matching parent filter should be parent of itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization

2014-07-11 Thread Nikolay Khitrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Khitrin updated LUCENE-5816:


Description: 
For now ToParentBlockJoinQuery accepts only child
documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
undefined behavior and garbage in results, unfortunately it also affects
TPBJQ.advance().  After pointed patch IllegalStateException is
thrown when this occurs.

So we must always take parent-child relations into account while writing
queries. At most of time it is necessary when writing a query, but sometimes,
filters can be independent of data model (for example, ACL filters:
+TPBJQ +allowed:user).

TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
or returned from childScorer.advance(). This change doesn't break anything:
results will be absolutely the same for parent-child orthogonal queries.

In few words: Document matching parent filter should be parent of itself.

  was:
For now ToParentBlockJoinQuery accepts only child
documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
undefined behavior and garbage in results, unfortunatelly it also affects
TPBJQ.advance().  After pointed patch IllegalStateException is
thrown when this occurs.

So we must always take parent-child relations into account while writing
queries. At most of time it is necessary when writing a query, but sometimes,
filters can be independent of data model (for example, ACL filters:
+TPBJQ +allowed:user).

TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
or returned from childScorer.advance(). This change doesn't break anything:
results will be absolutely the same for parent-child orthogonal queries.

In few words: Document matching parent filter should be parent of itself.


 ToParentBlockJoinQuery deothogonalization
 -

 Key: LUCENE-5816
 URL: https://issues.apache.org/jira/browse/LUCENE-5816
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 4.9
Reporter: Nikolay Khitrin
 Attachments: LUCENE-5816.patch


 For now ToParentBlockJoinQuery accepts only child
 documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
 undefined behavior and garbage in results, unfortunately it also affects
 TPBJQ.advance().  After pointed patch IllegalStateException is
 thrown when this occurs.
 So we must always take parent-child relations into account while writing
 queries. At most of time it is necessary when writing a query, but sometimes,
 filters can be independent of data model (for example, ACL filters:
 +TPBJQ +allowed:user).
 TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
 or returned from childScorer.advance(). This change doesn't break anything:
 results will be absolutely the same for parent-child orthogonal queries.
 In few words: Document matching parent filter should be parent of itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6238) Specialized test case for leader recovery scenario

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058786#comment-14058786
 ] 

Mark Miller commented on SOLR-6238:
---

bq. If the leader lost its connection with ZK then it should've rejoined 
election on reconnect. If so, why was an add request on this (old) leader 
successful?

The only thing I can reason so far is

Leader - doc gets past zk check
Leader - Lost Connection with ZK
Replica - Became leader
Leader (old) - add document is successful. Forwards it to the replica
Replica - add document is unsuccessful as it is the leader and the request 
says it is coming from a leader
Leader (old) - reconnects to ZK, peer syncs with Replica and succeeds because 
it's not behind.

 Specialized test case for leader recovery scenario
 --

 Key: SOLR-6238
 URL: https://issues.apache.org/jira/browse/SOLR-6238
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.10

 Attachments: SOLR-6238.patch


 A scenario which could happen at least before the addition of 
 LeaderInitiatedRecoveryThread I think. Also this can happen only if one is 
 using a non cloud aware client ( which might be quite a few users ) given 
 that we have only SolrJ
 Events are in chronological order -
 Leader - Lost Connection with ZK
 Replica - Became leader
 Leader - add document is successful. Forwards it to the replica
 Replica - add document is unsuccessful as it is the leader and the request 
 says it is coming from a leader
 So as of now the the Replica(new leader) won't have the doc but the 
 leader(old leader) will have the document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively

2014-07-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058817#comment-14058817
 ] 

Noble Paul commented on SOLR-5473:
--

bq.There should at least be an option to listen to collections of choice so 
that you don't have to fetch them for each request

Are you talking about Solr nodes?  watching nodes is error prone unless we have 
clear rules on how to watch/unwatch .

The strategy should be similar to SolrJ where it watches nothing but caches 
everything

SolrDispatchFIlter will be enhanced to use caching similar to CLoudSolrServer

 Split clusterstate.json per collection and watch states selectively 
 

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, 
 SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5817) hunspell buggy zero-affix handling

2014-07-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5817:


Attachment: LUCENE-5817.patch

patch with a simple test

 hunspell buggy zero-affix handling
 --

 Key: LUCENE-5817
 URL: https://issues.apache.org/jira/browse/LUCENE-5817
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5817.patch


 This only partially works today. But zero-affixes are used heavily by many 
 dictionaries (e.g. i found a good number of bugs in czech and latvian just 
 experimenting).
 The fix is easy: we just have to look for 0 in the affix portion as well as 
 the strip portion, as indicated by the manual page:
 Zero stripping or affix are indicated by zero.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5817) hunspell buggy zero-affix handling

2014-07-11 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5817:
---

 Summary: hunspell buggy zero-affix handling
 Key: LUCENE-5817
 URL: https://issues.apache.org/jira/browse/LUCENE-5817
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5817.patch

This only partially works today. But zero-affixes are used heavily by many 
dictionaries (e.g. i found a good number of bugs in czech and latvian just 
experimenting).

The fix is easy: we just have to look for 0 in the affix portion as well as 
the strip portion, as indicated by the manual page:

Zero stripping or affix are indicated by zero.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization

2014-07-11 Thread Nikolay Khitrin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Khitrin updated LUCENE-5816:


Attachment: LUCENE-5816.patch

Removed validation test due to removing exception from TPBJQ.

 ToParentBlockJoinQuery deothogonalization
 -

 Key: LUCENE-5816
 URL: https://issues.apache.org/jira/browse/LUCENE-5816
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 4.9
Reporter: Nikolay Khitrin
 Attachments: LUCENE-5816.patch, LUCENE-5816.patch


 For now ToParentBlockJoinQuery accepts only child
 documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to
 undefined behavior and garbage in results, unfortunately it also affects
 TPBJQ.advance().  After pointed patch IllegalStateException is
 thrown when this occurs.
 So we must always take parent-child relations into account while writing
 queries. At most of time it is necessary when writing a query, but sometimes,
 filters can be independent of data model (for example, ACL filters:
 +TPBJQ +allowed:user).
 TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance()
 or returned from childScorer.advance(). This change doesn't break anything:
 results will be absolutely the same for parent-child orthogonal queries.
 In few words: Document matching parent filter should be parent of itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-07-11 Thread Timothy Potter (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-6136:
-

Attachment: SOLR-6136.patch

Here's a patch based largely on Brandon's original patch, using wait / 
notifyAll instead of the spin lock in blockUntilFinished. As mentioned above, 
VisualVM shows good evidence of this improvement in that the amount of CPU 
spent in the block method is negligible with this patch (and very noticeable 
without it).

I've also included the first cut at a unit test for CUSS. There's probably more 
things we can do to exercise the logic in CUSS, so let me know if you have any 
other ideas for the unit test.

Brandon - please try this patch out in your environment if possible. I'll plan 
to commit this to trunk and backport to 4x branch in a few days after keeping 
on eye on things in Jenkins.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Assignee: Timothy Potter
Priority: Critical
 Attachments: SOLR-6136.patch, wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-07-11 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058858#comment-14058858
 ] 

Timothy Potter commented on SOLR-6136:
--

btw - I decided to not mess with the threadCount stuff Mark and I were 
discussing here as that should be handled under another improvement ticket 
after doing some benchmarking to show if it even helps.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Assignee: Timothy Potter
Priority: Critical
 Attachments: SOLR-6136.patch, wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5817) hunspell buggy zero-affix handling

2014-07-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058867#comment-14058867
 ] 

ASF subversion and git services commented on LUCENE-5817:
-

Commit 1609723 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1609723 ]

LUCENE-5817: Fix hunspell zero-affix handling

 hunspell buggy zero-affix handling
 --

 Key: LUCENE-5817
 URL: https://issues.apache.org/jira/browse/LUCENE-5817
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5817.patch


 This only partially works today. But zero-affixes are used heavily by many 
 dictionaries (e.g. i found a good number of bugs in czech and latvian just 
 experimenting).
 The fix is easy: we just have to look for 0 in the affix portion as well as 
 the strip portion, as indicated by the manual page:
 Zero stripping or affix are indicated by zero.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5817) hunspell buggy zero-affix handling

2014-07-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058874#comment-14058874
 ] 

ASF subversion and git services commented on LUCENE-5817:
-

Commit 1609725 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1609725 ]

LUCENE-5817: Fix hunspell zero-affix handling

 hunspell buggy zero-affix handling
 --

 Key: LUCENE-5817
 URL: https://issues.apache.org/jira/browse/LUCENE-5817
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5817.patch


 This only partially works today. But zero-affixes are used heavily by many 
 dictionaries (e.g. i found a good number of bugs in czech and latvian just 
 experimenting).
 The fix is easy: we just have to look for 0 in the affix portion as well as 
 the strip portion, as indicated by the manual page:
 Zero stripping or affix are indicated by zero.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5817) hunspell buggy zero-affix handling

2014-07-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5817.
-

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

 hunspell buggy zero-affix handling
 --

 Key: LUCENE-5817
 URL: https://issues.apache.org/jira/browse/LUCENE-5817
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5817.patch


 This only partially works today. But zero-affixes are used heavily by many 
 dictionaries (e.g. i found a good number of bugs in czech and latvian just 
 experimenting).
 The fix is easy: we just have to look for 0 in the affix portion as well as 
 the strip portion, as indicated by the manual page:
 Zero stripping or affix are indicated by zero.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058879#comment-14058879
 ] 

Mark Miller commented on SOLR-5495:
---

I did a quick review of the code and read your comments above more thoroughly. 
I did not do a low level review. From that mid level though, this looks like a 
great change and even if there are any issues, the changes look like good 
improvements and we should just work through anything that comes up as a result 
of them.

As I work on anything in that area, I'll look at some parts more closely.

 Recovery strategy for leader partitioned from replica case.
 ---

 Key: SOLR-5495
 URL: https://issues.apache.org/jira/browse/SOLR-5495
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Timothy Potter
 Fix For: 4.9

 Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch


 We need to work out a strategy for the case of:
 Leader and replicas can still talk to ZooKeeper, Leader cannot talk to 
 replica.
 We punted on this in initial design, but I'd like to get something in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively

2014-07-11 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058880#comment-14058880
 ] 

Ramkumar Aiyengar commented on SOLR-5473:
-

Got it.

I wouldn't say I am still convinced that caching till you fail is the same as 
watching. There are still cases around (unfortunately not very reproduceable 
enough to be fixed) the cluster state tells you for sure that some node is off 
limits, but the actual request doesn't fail fast enough. More generally, 
ideally I would like to be able to inform that for some reason a node shouldn't 
be used for whatever environmental reason even though it physically is up (may 
be it is doing something weird and I would like for it to be up and available 
for debugging while not affecting queries). Currently that's not possible, but 
something we might work to get added. It would be good to have the true ZK 
state instead of lazily updating it on error.

bq. watching nodes is error prone unless we have clear rules on how to 
watch/unwatch

That's why I am saying that at least in the simplistic case this should be left 
to configuration -- watch none, all, or selected. That would at least open the 
doors for more sophisticated logic on making the selection smarter, but we 
shouldn't shut it out and require only caching to be used. Use the watch when 
you have been told to, else cache..


 Split clusterstate.json per collection and watch states selectively 
 

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, 
 SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3617) Consider adding start scripts.

2014-07-11 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058889#comment-14058889
 ] 

Varun Thacker commented on SOLR-3617:
-

bq. Also, I've built-in Shawn H's famous GC tuning options for Solr, which I've 
found to be good for many Solr workflows. In general, I think the start script 
should take as much of the thinking out of running Solr as possible, which 
includes baking in best practices.

+1 for baking in the best practices.

Should we have warnings for say - too less file handles, using a buggy java 
version etc? 

In one of Marks earlier comments he had mentioned that we could have a 
start-dev and a start-prod. These warnings would make sense in the start-prod 
script. Not sure if it's a good idea to have them if we have only one start 
script.

 Consider adding start scripts.
 --

 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
 Attachments: SOLR-3617.patch


 I've always found that starting Solr with java -jar start.jar is a little odd 
 if you are not a java guy, but I think there are bigger pros than looking 
 less odd in shipping some start scripts.
 Not only do you get a cleaner start command:
 sh solr.sh or solr.bat or something
 But you also can do a couple other little nice things:
 * it becomes fairly obvious for a new casual user to see how to start the 
 system without reading doc.
 * you can make the working dir the location of the script - this lets you 
 call the start script from another dir and still have all the relative dir 
 setup work.
 * have an out of the box place to save startup params like -Xmx.
 * we could have multiple start scripts - say solr-dev.sh that logged to the 
 console and default to sys default for RAM - and also solr-prod which was 
 fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
 etc.
 You would still of course be able to make the java cmd directly - and that is 
 probably what you would do when it's time to run as a service - but these 
 could be good starter scripts to get people on the right track and improve 
 the initial user experience.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.

2014-07-11 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1405#comment-1405
 ] 

Timothy Potter commented on SOLR-5495:
--

Hi Mark,

Awesome, thanks for the review ... there's one area in the CoreAdminHandler 
waitForState that could use your review.

  // TODO: This is funky but I've seen this in testing where the 
replica asks the
  // leader to be in recovery? Need to track down how that happens 
... in the meantime,
  // this is a safeguard 
  boolean leaderDoesNotNeedRecovery = (onlyIfLeader != null  
  onlyIfLeader  
  core.getName().equals(nodeProps.getStr(core)) 
  ZkStateReader.RECOVERING.equals(waitForState)  
  ZkStateReader.ACTIVE.equals(localState)  
  ZkStateReader.ACTIVE.equals(state));


Basically, at some point, I was seeing replicas ask active leaders to recover, 
which I didn't think was a valid thing to do. I actually haven't seen this 
occur in any of my testing so maybe I was just confused. We can definitely 
remove that code if it's not valid, but wanted to make you aware that I had it 
in there ;-)

 Recovery strategy for leader partitioned from replica case.
 ---

 Key: SOLR-5495
 URL: https://issues.apache.org/jira/browse/SOLR-5495
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Timothy Potter
 Fix For: 4.9

 Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch


 We need to work out a strategy for the case of:
 Leader and replicas can still talk to ZooKeeper, Leader cannot talk to 
 replica.
 We punted on this in initial design, but I'd like to get something in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058910#comment-14058910
 ] 

Mark Miller commented on SOLR-6136:
---

{noformat}
-final UpdateRequest updateRequest = queue.poll(250,
-TimeUnit.MILLISECONDS);
+final UpdateRequest updateRequest = 
+queue.poll(pollQueueTime, TimeUnit.MILLISECONDS);
 if (updateRequest == null)
{noformat}

Know when that bug was introduced? If it went out in 4.9, that is a pretty 
severe performance bug if you are not streaming or batching big.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Assignee: Timothy Potter
Priority: Critical
 Attachments: SOLR-6136.patch, wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5818) Fix hunspell zero-string overgeneration

2014-07-11 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5818:
---

 Summary: Fix hunspell zero-string overgeneration
 Key: LUCENE-5818
 URL: https://issues.apache.org/jira/browse/LUCENE-5818
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Currently, its allowed to strip suffixes/prefixes all the way down to the empty 
string. But this is not really allowed, and creates overgeneration in some 
cases (especially where endings can be standalone ... typically these are 
stopwords so it causes a lot of damage).

Example is czech 'už' which should just stem to itself, but today also stems to 
'úžit' because it has a flag compatible with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5818) Fix hunspell zero-string overgeneration

2014-07-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5818:


Attachment: LUCENE-5818.patch

Simple patch with some tests. This might be a bug i introduced when cutting 
over to FST, because we had no test for it before.

 Fix hunspell zero-string overgeneration
 ---

 Key: LUCENE-5818
 URL: https://issues.apache.org/jira/browse/LUCENE-5818
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5818.patch


 Currently, its allowed to strip suffixes/prefixes all the way down to the 
 empty string. But this is not really allowed, and creates overgeneration in 
 some cases (especially where endings can be standalone ... typically these 
 are stopwords so it causes a lot of damage).
 Example is czech 'už' which should just stem to itself, but today also stems 
 to 'úžit' because it has a flag compatible with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-07-11 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058916#comment-14058916
 ] 

Timothy Potter commented on SOLR-6136:
--

not sure about that one ... just something I caught while working on this issue

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Assignee: Timothy Potter
Priority: Critical
 Attachments: SOLR-6136.patch, wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3365) Data import using local time to mark last_index_time

2014-07-11 Thread Shinichiro Abe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichiro Abe updated SOLR-3365:
-

Attachment: SOLR-3365.patch

Simple patch for trunk. It would be nice if we could configure time zone when 
database server's timezone is differed from Solr server's one. Because 
currently we have to add '-Duser.timezone=foobar' when starting Solr as a 
workaround.

e.g.
{code:xml}
propertyWriter type=SimplePropertiesWriter timezone=Asia/Tokyo /
Or
propertyWriter type=SimplePropertiesWriter timezone=Etc/GMT-9 /
Or
propertyWriter type=SimplePropertiesWriter timezone=Etc/GMT /
{code}

 Data import using local time to mark last_index_time
 

 Key: SOLR-3365
 URL: https://issues.apache.org/jira/browse/SOLR-3365
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
 Environment: 1 mysql data source server
 2 solr servers 
Reporter: Bartosz Cembor
 Attachments: SOLR-3365.patch


 Class org.apache.solr.handler.dataimport.DataImporter
 setIndexStartTime(new Date());
 When there is difference in time beetwen servers (mysql and solr) some 
 documents are not indexed 
 I think DataImporter should take time from mysql database (SELECT NOW()) and 
 use it for mark start_index_time



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #656: POMs out of sync

2014-07-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/656/

2 tests failed.
FAILED:  
org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.handler.TestReplicationHandlerBackup: 
   1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at java.net.URL.openStream(URL.java:1037)
at 
org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.handler.TestReplicationHandlerBackup: 
   1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at java.net.URL.openStream(URL.java:1037)
at 
org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314)
at __randomizedtesting.SeedInfo.seed([BDF25AD236E158BA]:0)


FAILED:  
org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at java.net.URL.openStream(URL.java:1037)
at 
org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie 
threads that couldn't be terminated:
   1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, 
group=TGRP-TestReplicationHandlerBackup]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at 

[jira] [Resolved] (LUCENE-5818) Fix hunspell zero-string overgeneration

2014-07-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5818.
-

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

 Fix hunspell zero-string overgeneration
 ---

 Key: LUCENE-5818
 URL: https://issues.apache.org/jira/browse/LUCENE-5818
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5818.patch


 Currently, its allowed to strip suffixes/prefixes all the way down to the 
 empty string. But this is not really allowed, and creates overgeneration in 
 some cases (especially where endings can be standalone ... typically these 
 are stopwords so it causes a lot of damage).
 Example is czech 'už' which should just stem to itself, but today also stems 
 to 'úžit' because it has a flag compatible with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5818) Fix hunspell zero-string overgeneration

2014-07-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059042#comment-14059042
 ] 

Hoss Man commented on LUCENE-5818:
--

For those keeping score at home...

http://svn.apache.org/r1609738
http://svn.apache.org/r1609739



 Fix hunspell zero-string overgeneration
 ---

 Key: LUCENE-5818
 URL: https://issues.apache.org/jira/browse/LUCENE-5818
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, 4.10

 Attachments: LUCENE-5818.patch


 Currently, its allowed to strip suffixes/prefixes all the way down to the 
 empty string. But this is not really allowed, and creates overgeneration in 
 some cases (especially where endings can be standalone ... typically these 
 are stopwords so it causes a lot of damage).
 Example is czech 'už' which should just stem to itself, but today also stems 
 to 'úžit' because it has a flag compatible with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059045#comment-14059045
 ] 

Mark Miller commented on SOLR-5473:
---

bq. I wouldn't say I am still convinced that caching till you fail is the same 
as watching. 

I don't believe it is either. I like it less where you don't need thousands of 
collections. Perhaps we consider making it an optional optimization on 
CloudSolrServer?

Given the scaling gains for collections though, this issue overall does seem 
worth any tradeoffs and it seems like improvements and options can be made 
where appropriate.

 Split clusterstate.json per collection and watch states selectively 
 

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0

 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
 SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, 
 SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059082#comment-14059082
 ] 

Mark Miller commented on SOLR-6136:
---

Ah, I see, it's a different poll call. I only had this take affect on one of 
the poll calls because that was enough to relieve the performance issue in my 
benchmarks. I think it makes sense to expand it to the this other poll call as 
well.

+1 on the patch, looks okay to me, tests pass locally.

I don't want to think about testing CUSS at the moment, but nice work on a new 
test for it. Will be great to have it grow - this has been a troublesome class 
to stabilize over the years.

 ConcurrentUpdateSolrServer includes a Spin Lock
 ---

 Key: SOLR-6136
 URL: https://issues.apache.org/jira/browse/SOLR-6136
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
Reporter: Brandon Chapman
Assignee: Timothy Potter
Priority: Critical
 Attachments: SOLR-6136.patch, wait___notify_all.patch


 ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This 
 causes an extremely high amount of CPU to be used on the Cloud Leader during 
 indexing.
 Here is a summary of our system testing. 
 Importing data on Solr4.5.0: 
 Throughput gets as high as 240 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java
 Importing data on Solr4.7.0 with no replicas: 
 Throughput peaks at 350 documents per second.
 [tomcat@solr-stg01 logs]$ uptime 
 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java
 Importing data on Solr4.7.0 with replicas: 
 Throughput peaks at 30 documents per second because the Solr machine is out 
 of CPU.
 [tomcat@solr-stg01 logs]$ uptime 
 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6203) cast exception while searching with sort function and result grouping

2014-07-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059132#comment-14059132
 ] 

Hoss Man commented on SOLR-6203:


I haven't looked into this, but i remember similar issues were problematic with 
normal distributed sorting in older versions of Solr, but this should have 
largely been resolved by SOLR-5354 -- see in particular this comment  
subsequent commit...
https://issues.apache.org/jira/browse/SOLR-5354?focusedCommentId=13835891page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13835891
https://svn.apache.org/r1547430

..apparently we're still missing a place in the grouping code that should be 
looking at SortSpec.getSchemaFields() and isn't.

 cast exception while searching with sort function and result grouping
 -

 Key: SOLR-6203
 URL: https://issues.apache.org/jira/browse/SOLR-6203
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.7, 4.8
Reporter: Nathan Dire
 Attachments: SOLR-6203-unittest.patch


 After upgrading from 4.5.1 to 4.7+, a schema including a {{*}} dynamic 
 field as text gets a cast exception when using a sort function and result 
 grouping.  
 Repro (with example config):
 # Add {{*}} dynamic field as a {{TextField}}, eg:
 {noformat}
 dynamicField name=* type=text_general multiValued=true /
 {noformat}
 #  Create  sharded collection
 {noformat}
 curl 
 'http://localhost:8983/solr/admin/collections?action=CREATEname=testnumShards=2maxShardsPerNode=2'
 {noformat}
 # Add example docs (query must have some results)
 # Submit query which sorts on a function result and uses result grouping:
 {noformat}
 {
   responseHeader: {
 status: 500,
 QTime: 50,
 params: {
   sort: sqrt(popularity) desc,
   indent: true,
   q: *:*,
   _: 1403709010008,
   group.field: manu,
   group: true,
   wt: json
 }
   },
   error: {
 msg: java.lang.Double cannot be cast to 
 org.apache.lucene.util.BytesRef,
 code: 500
   }
 }
 {noformat}
 Source exception from log:
 {noformat}
 ERROR - 2014-06-25 08:10:10.055; org.apache.solr.common.SolrException; 
 java.lang.ClassCastException: java.lang.Double cannot be cast to 
 org.apache.lucene.util.BytesRef
 at 
 org.apache.solr.schema.FieldType.marshalStringSortValue(FieldType.java:981)
 at org.apache.solr.schema.TextField.marshalSortValue(TextField.java:176)
 at 
 org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.serializeSearchGroup(SearchGroupsResultTransformer.java:125)
 at 
 org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:65)
 at 
 org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:43)
 at 
 org.apache.solr.search.grouping.CommandHandler.processResult(CommandHandler.java:193)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:340)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   ...
 {noformat}
 It looks like {{serializeSearchGroup}} is matching the sort expression as the 
 {{*}} dynamic field, which is a TextField in the repro.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_05) - Build # 10669 - Still Failing!

2014-07-11 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10669/
Java: 64bit/jdk1.8.0_05 -XX:-UseCompressedOops -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 58917 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:406: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:179: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* 
./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff
* 
./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic

Total time: 79 minutes 35 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 64bit/jdk1.8.0_05 
-XX:-UseCompressedOops -XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6235) SyncSliceTest fails on jenkins with no live servers available error

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059139#comment-14059139
 ] 

Mark Miller commented on SOLR-6235:
---

Right. I think a minimum, doing nothing is probably preferable in most cases. 
Perhaps a retry or two?

Or perhaps we should look at sending to leaders to originate. We would still 
want to commit everywhere in parallel though, and I'm not sure we can do 
anything that is that much better.

The current situation doesn't seem good though.

 SyncSliceTest fails on jenkins with no live servers available error
 ---

 Key: SOLR-6235
 URL: https://issues.apache.org/jira/browse/SOLR-6235
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, Tests
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.10


 {code}
 1 tests failed.
 FAILED:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch
 Error Message:
 No live SolrServers available to handle this request
 Stack Trace:
 org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
 available to handle this request
 at 
 __randomizedtesting.SeedInfo.seed([685C57B3F25C854B:E9BAD9AB8503E577]:0)
 at 
 org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:317)
 at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:659)
 at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
 at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1149)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1118)
 at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:236)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_05) - Build # 10669 - Still Failing!

2014-07-11 Thread Michael McCandless
Rob fixed.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 11, 2014 at 2:20 PM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10669/
 Java: 64bit/jdk1.8.0_05 -XX:-UseCompressedOops -XX:+UseSerialGC

 All tests passed

 Build Log:
 [...truncated 58917 lines...]
 BUILD FAILED
 /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The following 
 error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:406: The following 
 error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The 
 following error occurred while executing this line:
 /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:179: The 
 following files are missing svn:eol-style (or binary svn:mime-type):
 * 
 ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff
 * 
 ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic

 Total time: 79 minutes 35 seconds
 Build step 'Invoke Ant' marked build as failure
 [description-setter] Description set: Java: 64bit/jdk1.8.0_05 
 -XX:-UseCompressedOops -XX:+UseSerialGC
 Archiving artifacts
 Recording test results
 Email was triggered for: Failure - Any
 Sending email for trigger: Failure - Any




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_60) - Build # 10788 - Still Failing!

2014-07-11 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10788/
Java: 64bit/jdk1.7.0_60 -XX:-UseCompressedOops -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 59435 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:406: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:87: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:179: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* 
./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff
* 
./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic

Total time: 82 minutes 37 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 64bit/jdk1.7.0_60 
-XX:-UseCompressedOops -XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1705 - Failure!

2014-07-11 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1705/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 58831 lines...]
BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:467: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:406: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:87: The 
following error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:179: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* 
./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff
* 
./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic

Total time: 138 minutes 28 seconds
Build step 'Invoke Ant' marked build as failure
[description-setter] Description set: Java: 64bit/jdk1.8.0 
-XX:+UseCompressedOops -XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Erick Erickson
I think I've become aware of an edge case that I'm wondering is worth
a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
indexes and add them one by one to the running Solr node via merge
indexes. The mergeFactor appears to be ignored in this scenario.
Indeed, I suspect (without proof) that the entire merge policy is
never referenced at all.

Historically this hasn't mattered, since merging indexes was
1 a rare operation
2 the merge policy _does_ kick in when the index has more documents
added to it via the normal (not merge indexes) policy so things would
be cleaned up.

All that said, the mapReduceIndexerTool is a scenario where we may be
merging multiple times without every indexing documents any other way.
Seems like the core admin API should trigger the merge policy logic
somehow. The problem here is that the number of segments can grow
without bound.

Worth a JIRA?

Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3617) Consider adding start scripts.

2014-07-11 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059346#comment-14059346
 ] 

Timothy Potter commented on SOLR-3617:
--

Hi Varun,

Thanks for the feedback. Good idea about checking Java version but it's tough 
to know how many file handles is sufficient.

Also, I'm favoring one script to rule them all ;-) Although I'm willing to be 
convinced of the start-dev / start-prod, I like bin/solr because it's simple 
and intuitive. In addition:

1) One script to maintain / document; well two actually scripts (one for *nix 
and other for Windows)
2) I'm turning off prod options when you enable an example using the -e flag
3) There's going to be a lot of overlap in the logic between the two scripts 
anyway

I'm cooking up the Windows version today and have some updates to the *nix one, 
the main one being a restart option as described above as I convinced myself 
while writing the initial comment above that restart is a more standard 
approach. Another patch coming soon ...

 Consider adding start scripts.
 --

 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
 Attachments: SOLR-3617.patch


 I've always found that starting Solr with java -jar start.jar is a little odd 
 if you are not a java guy, but I think there are bigger pros than looking 
 less odd in shipping some start scripts.
 Not only do you get a cleaner start command:
 sh solr.sh or solr.bat or something
 But you also can do a couple other little nice things:
 * it becomes fairly obvious for a new casual user to see how to start the 
 system without reading doc.
 * you can make the working dir the location of the script - this lets you 
 call the start script from another dir and still have all the relative dir 
 setup work.
 * have an out of the box place to save startup params like -Xmx.
 * we could have multiple start scripts - say solr-dev.sh that logged to the 
 console and default to sys default for RAM - and also solr-prod which was 
 fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
 etc.
 You would still of course be able to make the java cmd directly - and that is 
 probably what you would do when it's time to run as a service - but these 
 could be good starter scripts to get people on the right track and improve 
 the initial user experience.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6193) using facet.* parameters as local params inside of facet.field causes problems in distributed search

2014-07-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6193:
---

Description: 
The distributed request logic for faceting (which has to clonemodify requests 
to individual shards for dealing with things like facet.mincount, facet.sort, 
facet.limit,  facet.offset so that the distributed aggregation is correct) 
doesn't properly take into account localparams contained in each of the facet 
params and how they should affect the initial shard requests and the subsequent 
refinement requests.

{panel:title=Initial problem example reported by user}
When a distributed search contains multiselect faceting the per-field faceting 
options are not honored for alternate selections of the field. For example with 
a query like:
{noformat}
facet.field=blahfacet.field={!key myblah 
facet.offset=10}blahf.blah.facet.offset=20
{noformat}
The returned facet results for both blah and myblah will use an offset of 20 as 
opposed to a standard search returning myblah with an offset of 10.
{panel}

  was:
When a distributed search contains multiselect faceting the per-field faceting 
options are not honored for alternate selections of the field. For example with 
a query like:
{noformat}
facet.field=blahfacet.field={!key myblah 
facet.offset=10}blahf.blah.facet.offset=20
{noformat}
The returned facet results for both blah and myblah will use an offset of 20 as 
opposed to a standard search returning myblah with an offset of 10.

Summary: using facet.* parameters as local params inside of facet.field 
causes problems in distributed search  (was: Distributed search with 
multiselect faceting ignores the facet.offset local parameter)

 using facet.* parameters as local params inside of facet.field causes 
 problems in distributed search
 

 Key: SOLR-6193
 URL: https://issues.apache.org/jira/browse/SOLR-6193
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8.1
 Environment: OS X 10.9.3 Apache Tomcat 7.0.41
 Debian Apache Tomcat 7
Reporter: John Gibson
 Attachments: bad_facet_offset_test_4_8_x.patch


 The distributed request logic for faceting (which has to clonemodify 
 requests to individual shards for dealing with things like facet.mincount, 
 facet.sort, facet.limit,  facet.offset so that the distributed aggregation 
 is correct) doesn't properly take into account localparams contained in each 
 of the facet params and how they should affect the initial shard requests and 
 the subsequent refinement requests.
 {panel:title=Initial problem example reported by user}
 When a distributed search contains multiselect faceting the per-field 
 faceting options are not honored for alternate selections of the field. For 
 example with a query like:
 {noformat}
 facet.field=blahfacet.field={!key myblah 
 facet.offset=10}blahf.blah.facet.offset=20
 {noformat}
 The returned facet results for both blah and myblah will use an offset of 20 
 as opposed to a standard search returning myblah with an offset of 10.
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3617) Consider adding start scripts.

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059389#comment-14059389
 ] 

Mark Miller commented on SOLR-3617:
---

Yeah, a flag for a more friendly dev mode is just as good as another script.

 Consider adding start scripts.
 --

 Key: SOLR-3617
 URL: https://issues.apache.org/jira/browse/SOLR-3617
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
 Attachments: SOLR-3617.patch


 I've always found that starting Solr with java -jar start.jar is a little odd 
 if you are not a java guy, but I think there are bigger pros than looking 
 less odd in shipping some start scripts.
 Not only do you get a cleaner start command:
 sh solr.sh or solr.bat or something
 But you also can do a couple other little nice things:
 * it becomes fairly obvious for a new casual user to see how to start the 
 system without reading doc.
 * you can make the working dir the location of the script - this lets you 
 call the start script from another dir and still have all the relative dir 
 setup work.
 * have an out of the box place to save startup params like -Xmx.
 * we could have multiple start scripts - say solr-dev.sh that logged to the 
 console and default to sys default for RAM - and also solr-prod which was 
 fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) 
 etc.
 You would still of course be able to make the java cmd directly - and that is 
 probably what you would do when it's time to run as a service - but these 
 could be good starter scripts to get people on the right track and improve 
 the initial user experience.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6193) using facet.* parameters as local params inside of facet.field causes problems in distributed search

2014-07-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059391#comment-14059391
 ] 

Hoss Man commented on SOLR-6193:


The crux of the problem is that the distributed facet logic, and the the shard 
sub-requests are generated, pre-dates the support for using local params in 
{{facet.field}} and is built upon the previous work of using per-field 
overrides (ie: {{f.myFieldName.facet.mincount=5}}).  

As a result, from my quick review of the code, there seem to be 2 different 
types of mistakes that can pop up in logic for building the shard requests:

* not recognizing at the local params when computing shard request values (ie: 
ignoring localparam facet.limit to decide what the overrequest values should be 
for a given field)
* propogating the localparam values to the shards in addition to the 
syntheticly generated f.foo.param equivilent for the shard (ie: sending the 
original localparams which might include facet.mincount even when the 
distributed logic is trying to force a mincount of 0 for the initial top-N 
computation.

Adding to the complications, is that off the top of my head, i can't remember 
what sort of decisions were made when the localparam support was added 
regarding the precidence between a general local param vs a top level per-field 
param -- ie: what should the effective limit be here: 
{{f.foo.facet.limit=99facet.field=\{!facet.limit=44\}foo}}

---

I think in general we should overhaul the way the distributed requests modify 
the per-field facet params to instead put all of those per field modifications 
directly in the local params of the shard requests -- among other things, this 
will help eliminate collision in some of the computed facet params when 
faceting on the same field multiple ways.

before we tackle this though, we need a lot more comprehensive tests for some 
of these more complex situations -- beyond just the minimal distrib test that 
compares with the control collection.  We need to assert that we get the 
specific expect responses, otherwise we could break both the existing 
single-node behavior in a lot of cases and never notice as long as the distrib 
behavior breaks in the same way.

 using facet.* parameters as local params inside of facet.field causes 
 problems in distributed search
 

 Key: SOLR-6193
 URL: https://issues.apache.org/jira/browse/SOLR-6193
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.8.1
 Environment: OS X 10.9.3 Apache Tomcat 7.0.41
 Debian Apache Tomcat 7
Reporter: John Gibson
 Attachments: bad_facet_offset_test_4_8_x.patch


 The distributed request logic for faceting (which has to clonemodify 
 requests to individual shards for dealing with things like facet.mincount, 
 facet.sort, facet.limit,  facet.offset so that the distributed aggregation 
 is correct) doesn't properly take into account localparams contained in each 
 of the facet params and how they should affect the initial shard requests and 
 the subsequent refinement requests.
 {panel:title=Initial problem example reported by user}
 When a distributed search contains multiselect faceting the per-field 
 faceting options are not honored for alternate selections of the field. For 
 example with a query like:
 {noformat}
 facet.field=blahfacet.field={!key myblah 
 facet.offset=10}blahf.blah.facet.offset=20
 {noformat}
 The returned facet results for both blah and myblah will use an offset of 20 
 as opposed to a standard search returning myblah with an offset of 10.
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Mark Miller
I think you would probably want to control the number of segments with the 
MapReduceIndexerTool before doing the merge initially, and if you find you have 
too many segments over time as you add more and more data, use a force merge 
call to reduce the number segments, either manually or scheduled.

-- 
Mark Miller
about.me/markrmiller

On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote:
 I think I've become aware of an edge case that I'm wondering is worth
 a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
 indexes and add them one by one to the running Solr node via merge
 indexes. The mergeFactor appears to be ignored in this scenario.
 Indeed, I suspect (without proof) that the entire merge policy is
 never referenced at all.
 
 Historically this hasn't mattered, since merging indexes was
 1 a rare operation
 2 the merge policy _does_ kick in when the index has more documents
 added to it via the normal (not merge indexes) policy so things would
 be cleaned up.
 
 All that said, the mapReduceIndexerTool is a scenario where we may be
 merging multiple times without every indexing documents any other way.
 Seems like the core admin API should trigger the merge policy logic
 somehow. The problem here is that the number of segments can grow
 without bound.
 
 Worth a JIRA?
 
 Erick
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Erick Erickson
bq: I think you would probably want to control the number of segments
with the MapReduceIndexerTool before doing the merge initially

This isn't the nub of the issue. Assuming that the number of segments
in the index merged in via MRIT is 1 each time, once that index gets
merged into the live Solr node, the segments don't get merged no
matter how many times another index is merged. I'm aware of an In the
wild situation where over 6 months, there are over 600 segments. All
updates were via MRIT.

run MRIT once, 1 segment
run MRIT a second time, 2 segments
.
.
.
run MRIT the Nth time, N segments (N  600 in  this case)

So running MRIT N times results in N segments on the Solr node since
merge _indexes_ doesn't trigger _segment_ merging AFAIK.

This has been masked in the past I'd guess because subsequent
regular indexing via SolrJ, post.jar, whatever _does_ then trigger
segment merging. But we haven't seen the situation reported before
where the _only_ way the index gets updated is via index merging.
Index merging is done via MRIT in this case although this has nothing
to do with MRIT and everything to do with the core admin mergeindexes
command. MRIT is only relevant here since it's pretty much the first
tool that conveniently allowed the only updates to be via
mergeindexes.

I reproduced this locally without MRIT by just taking a stock Solr,
copying the index somewhere else, setting mergeFactor=2 then merging
(and committing) again and again. Stopped at 15 segments or so. Then
sent a couple of updates up via cURL and the segment count dropped
back to 2..

Whether the right place to fix this is Solr core Admin API
MERGEINDEXES or in the lower-level Lucene call I don't have a strong
opinion about.

Of course one work-around is to periodically issue an optimize even
though Uwe cringes every time that gets mentioned ;)

On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote:
 I think you would probably want to control the number of segments with the 
 MapReduceIndexerTool before doing the merge initially, and if you find you 
 have too many segments over time as you add more and more data, use a force 
 merge call to reduce the number segments, either manually or scheduled.

 --
 Mark Miller
 about.me/markrmiller

 On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) 
 wrote:
 I think I've become aware of an edge case that I'm wondering is worth
 a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
 indexes and add them one by one to the running Solr node via merge
 indexes. The mergeFactor appears to be ignored in this scenario.
 Indeed, I suspect (without proof) that the entire merge policy is
 never referenced at all.

 Historically this hasn't mattered, since merging indexes was
 1 a rare operation
 2 the merge policy _does_ kick in when the index has more documents
 added to it via the normal (not merge indexes) policy so things would
 be cleaned up.

 All that said, the mapReduceIndexerTool is a scenario where we may be
 merging multiple times without every indexing documents any other way.
 Seems like the core admin API should trigger the merge policy logic
 somehow. The problem here is that the number of segments can grow
 without bound.

 Worth a JIRA?

 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-5672:
---

Assignee: Robert Muir

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir

 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059435#comment-14059435
 ] 

Robert Muir commented on LUCENE-5672:
-

I don't agree with this argument about This lets the caller decide how 
expensive should addIndexes be, on his part.

The user can freely configure this with MergePolicy. Its no different from any 
other index operation. This is a bug.

There is lots of confusion, including a current discussion on the ML.

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Robert Muir
You encouraged me to fix it :)

On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com wrote:
 bq: I think you would probably want to control the number of segments
 with the MapReduceIndexerTool before doing the merge initially

 This isn't the nub of the issue. Assuming that the number of segments
 in the index merged in via MRIT is 1 each time, once that index gets
 merged into the live Solr node, the segments don't get merged no
 matter how many times another index is merged. I'm aware of an In the
 wild situation where over 6 months, there are over 600 segments. All
 updates were via MRIT.

 run MRIT once, 1 segment
 run MRIT a second time, 2 segments
 .
 .
 .
 run MRIT the Nth time, N segments (N  600 in  this case)

 So running MRIT N times results in N segments on the Solr node since
 merge _indexes_ doesn't trigger _segment_ merging AFAIK.

 This has been masked in the past I'd guess because subsequent
 regular indexing via SolrJ, post.jar, whatever _does_ then trigger
 segment merging. But we haven't seen the situation reported before
 where the _only_ way the index gets updated is via index merging.
 Index merging is done via MRIT in this case although this has nothing
 to do with MRIT and everything to do with the core admin mergeindexes
 command. MRIT is only relevant here since it's pretty much the first
 tool that conveniently allowed the only updates to be via
 mergeindexes.

 I reproduced this locally without MRIT by just taking a stock Solr,
 copying the index somewhere else, setting mergeFactor=2 then merging
 (and committing) again and again. Stopped at 15 segments or so. Then
 sent a couple of updates up via cURL and the segment count dropped
 back to 2..

 Whether the right place to fix this is Solr core Admin API
 MERGEINDEXES or in the lower-level Lucene call I don't have a strong
 opinion about.

 Of course one work-around is to periodically issue an optimize even
 though Uwe cringes every time that gets mentioned ;)

 On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote:
 I think you would probably want to control the number of segments with the 
 MapReduceIndexerTool before doing the merge initially, and if you find you 
 have too many segments over time as you add more and more data, use a force 
 merge call to reduce the number segments, either manually or scheduled.

 --
 Mark Miller
 about.me/markrmiller

 On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) 
 wrote:
 I think I've become aware of an edge case that I'm wondering is worth
 a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
 indexes and add them one by one to the running Solr node via merge
 indexes. The mergeFactor appears to be ignored in this scenario.
 Indeed, I suspect (without proof) that the entire merge policy is
 never referenced at all.

 Historically this hasn't mattered, since merging indexes was
 1 a rare operation
 2 the merge policy _does_ kick in when the index has more documents
 added to it via the normal (not merge indexes) policy so things would
 be cleaned up.

 All that said, the mapReduceIndexerTool is a scenario where we may be
 merging multiple times without every indexing documents any other way.
 Seems like the core admin API should trigger the merge policy logic
 somehow. The problem here is that the number of segments can grow
 without bound.

 Worth a JIRA?

 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Mark Miller



On July 11, 2014 at 6:10:07 PM, Erick Erickson (erickerick...@gmail.com) wrote:
  bq: I think you would probably want to control the number of segments  
 with the MapReduceIndexerTool before doing the merge initially  
  
 This isn't the nub of the issue.

The rest of the sentence is required:

, and if you find you have too many segments over time as you add more and more 
data, use a force merge call to reduce the number segments, either manually or 
scheduled. 

Of course one work-around is to periodically issue an optimize even 
though Uwe cringes every time that gets mentioned ;) 

You don’t need a full optimize, you just need to occasionally force merge down 
to N segments. You could trigger it after running the mapreduce tool, chron 
could trigger it, or whatever.

That’s how you have to handle it currently.

-- 
Mark Miller
about.me/markrmiller

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Erick Erickson
Ah, OK. It's late on Friday and I missed that

Erick

On Fri, Jul 11, 2014 at 3:20 PM, Mark Miller markrmil...@gmail.com wrote:



 On July 11, 2014 at 6:10:07 PM, Erick Erickson (erickerick...@gmail.com) 
 wrote:
  bq: I think you would probably want to control the number of segments
 with the MapReduceIndexerTool before doing the merge initially

 This isn't the nub of the issue.

 The rest of the sentence is required:

 , and if you find you have too many segments over time as you add more and 
 more data, use a force merge call to reduce the number segments, either 
 manually or scheduled.

Of course one work-around is to periodically issue an optimize even
though Uwe cringes every time that gets mentioned ;)

 You don’t need a full optimize, you just need to occasionally force merge 
 down to N segments. You could trigger it after running the mapreduce tool, 
 chron could trigger it, or whatever.

 That’s how you have to handle it currently.

 --
 Mark Miller
 about.me/markrmiller

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Erick Erickson
It's been a whole hour, you're slowing down.

I promised the original reporter that there would be a JIRA he could
track, got one?

Erick

On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir rcm...@gmail.com wrote:
 You encouraged me to fix it :)

 On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 bq: I think you would probably want to control the number of segments
 with the MapReduceIndexerTool before doing the merge initially

 This isn't the nub of the issue. Assuming that the number of segments
 in the index merged in via MRIT is 1 each time, once that index gets
 merged into the live Solr node, the segments don't get merged no
 matter how many times another index is merged. I'm aware of an In the
 wild situation where over 6 months, there are over 600 segments. All
 updates were via MRIT.

 run MRIT once, 1 segment
 run MRIT a second time, 2 segments
 .
 .
 .
 run MRIT the Nth time, N segments (N  600 in  this case)

 So running MRIT N times results in N segments on the Solr node since
 merge _indexes_ doesn't trigger _segment_ merging AFAIK.

 This has been masked in the past I'd guess because subsequent
 regular indexing via SolrJ, post.jar, whatever _does_ then trigger
 segment merging. But we haven't seen the situation reported before
 where the _only_ way the index gets updated is via index merging.
 Index merging is done via MRIT in this case although this has nothing
 to do with MRIT and everything to do with the core admin mergeindexes
 command. MRIT is only relevant here since it's pretty much the first
 tool that conveniently allowed the only updates to be via
 mergeindexes.

 I reproduced this locally without MRIT by just taking a stock Solr,
 copying the index somewhere else, setting mergeFactor=2 then merging
 (and committing) again and again. Stopped at 15 segments or so. Then
 sent a couple of updates up via cURL and the segment count dropped
 back to 2..

 Whether the right place to fix this is Solr core Admin API
 MERGEINDEXES or in the lower-level Lucene call I don't have a strong
 opinion about.

 Of course one work-around is to periodically issue an optimize even
 though Uwe cringes every time that gets mentioned ;)

 On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote:
 I think you would probably want to control the number of segments with the 
 MapReduceIndexerTool before doing the merge initially, and if you find you 
 have too many segments over time as you add more and more data, use a force 
 merge call to reduce the number segments, either manually or scheduled.

 --
 Mark Miller
 about.me/markrmiller

 On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) 
 wrote:
 I think I've become aware of an edge case that I'm wondering is worth
 a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
 indexes and add them one by one to the running Solr node via merge
 indexes. The mergeFactor appears to be ignored in this scenario.
 Indeed, I suspect (without proof) that the entire merge policy is
 never referenced at all.

 Historically this hasn't mattered, since merging indexes was
 1 a rare operation
 2 the merge policy _does_ kick in when the index has more documents
 added to it via the normal (not merge indexes) policy so things would
 be cleaned up.

 All that said, the mapReduceIndexerTool is a scenario where we may be
 merging multiple times without every indexing documents any other way.
 Seems like the core admin API should trigger the merge policy logic
 somehow. The problem here is that the number of segments can grow
 without bound.

 Worth a JIRA?

 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5672:


Attachment: LUCENE-5672.patch

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5672.patch


 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Chris Hostetter

https://issues.apache.org/jira/browse/LUCENE-5672

: Date: Fri, 11 Jul 2014 15:22:40 -0700
: From: Erick Erickson erickerick...@gmail.com
: Reply-To: dev@lucene.apache.org
: To: dev@lucene.apache.org
: Subject: Re: Core admin merge indexes, should it trigger merge policy?
: 
: It's been a whole hour, you're slowing down.
: 
: I promised the original reporter that there would be a JIRA he could
: track, got one?
: 
: Erick
: 
: On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir rcm...@gmail.com wrote:
:  You encouraged me to fix it :)
: 
:  On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com 
wrote:
:  bq: I think you would probably want to control the number of segments
:  with the MapReduceIndexerTool before doing the merge initially
: 
:  This isn't the nub of the issue. Assuming that the number of segments
:  in the index merged in via MRIT is 1 each time, once that index gets
:  merged into the live Solr node, the segments don't get merged no
:  matter how many times another index is merged. I'm aware of an In the
:  wild situation where over 6 months, there are over 600 segments. All
:  updates were via MRIT.
: 
:  run MRIT once, 1 segment
:  run MRIT a second time, 2 segments
:  .
:  .
:  .
:  run MRIT the Nth time, N segments (N  600 in  this case)
: 
:  So running MRIT N times results in N segments on the Solr node since
:  merge _indexes_ doesn't trigger _segment_ merging AFAIK.
: 
:  This has been masked in the past I'd guess because subsequent
:  regular indexing via SolrJ, post.jar, whatever _does_ then trigger
:  segment merging. But we haven't seen the situation reported before
:  where the _only_ way the index gets updated is via index merging.
:  Index merging is done via MRIT in this case although this has nothing
:  to do with MRIT and everything to do with the core admin mergeindexes
:  command. MRIT is only relevant here since it's pretty much the first
:  tool that conveniently allowed the only updates to be via
:  mergeindexes.
: 
:  I reproduced this locally without MRIT by just taking a stock Solr,
:  copying the index somewhere else, setting mergeFactor=2 then merging
:  (and committing) again and again. Stopped at 15 segments or so. Then
:  sent a couple of updates up via cURL and the segment count dropped
:  back to 2..
: 
:  Whether the right place to fix this is Solr core Admin API
:  MERGEINDEXES or in the lower-level Lucene call I don't have a strong
:  opinion about.
: 
:  Of course one work-around is to periodically issue an optimize even
:  though Uwe cringes every time that gets mentioned ;)
: 
:  On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote:
:  I think you would probably want to control the number of segments with 
the MapReduceIndexerTool before doing the merge initially, and if you find you 
have too many segments over time as you add more and more data, use a force 
merge call to reduce the number segments, either manually or scheduled.
: 
:  --
:  Mark Miller
:  about.me/markrmiller
: 
:  On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) 
wrote:
:  I think I've become aware of an edge case that I'm wondering is worth
:  a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
:  indexes and add them one by one to the running Solr node via merge
:  indexes. The mergeFactor appears to be ignored in this scenario.
:  Indeed, I suspect (without proof) that the entire merge policy is
:  never referenced at all.
: 
:  Historically this hasn't mattered, since merging indexes was
:  1 a rare operation
:  2 the merge policy _does_ kick in when the index has more documents
:  added to it via the normal (not merge indexes) policy so things would
:  be cleaned up.
: 
:  All that said, the mapReduceIndexerTool is a scenario where we may be
:  merging multiple times without every indexing documents any other way.
:  Seems like the core admin API should trigger the merge policy logic
:  somehow. The problem here is that the number of segments can grow
:  without bound.
: 
:  Worth a JIRA?
: 
:  Erick
: 
:  -
:  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 
: 
: 
:  -
:  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 
:  -
:  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 
:  -
:  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 
: -
: To unsubscribe, 

[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059449#comment-14059449
 ] 

Mark Miller commented on LUCENE-5672:
-

bq. The user can freely configure this with MergePolicy. Its no different from 
any other index operation. This is a bug.

I was leaning towards Shai's argument at first, but after a bit of deeper 
thought, I agree with Robert.

I don't know that having an option to not use the merge policy will add any 
confusion if the default is right, but it does seem the merge policy itself is 
sufficient for cases I can think of. I don't know that you need this extra way 
to control merges.

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5672.patch


 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Core admin merge indexes, should it trigger merge policy?

2014-07-11 Thread Robert Muir
Yes, I opened that issue a while ago because i have seen people
accidentally create hundreds/thousands of segments (just with lucene
api) due to this same trap.

This should not happen to you by default.

If you want to create hundreds or thousands of segments for your
index, it should be because you (mis)configured your merge policy
intentionally to create such a situation.

On Fri, Jul 11, 2014 at 6:24 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 https://issues.apache.org/jira/browse/LUCENE-5672

 : Date: Fri, 11 Jul 2014 15:22:40 -0700
 : From: Erick Erickson erickerick...@gmail.com
 : Reply-To: dev@lucene.apache.org
 : To: dev@lucene.apache.org
 : Subject: Re: Core admin merge indexes, should it trigger merge policy?
 :
 : It's been a whole hour, you're slowing down.
 :
 : I promised the original reporter that there would be a JIRA he could
 : track, got one?
 :
 : Erick
 :
 : On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir rcm...@gmail.com wrote:
 :  You encouraged me to fix it :)
 : 
 :  On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 :  bq: I think you would probably want to control the number of segments
 :  with the MapReduceIndexerTool before doing the merge initially
 : 
 :  This isn't the nub of the issue. Assuming that the number of segments
 :  in the index merged in via MRIT is 1 each time, once that index gets
 :  merged into the live Solr node, the segments don't get merged no
 :  matter how many times another index is merged. I'm aware of an In the
 :  wild situation where over 6 months, there are over 600 segments. All
 :  updates were via MRIT.
 : 
 :  run MRIT once, 1 segment
 :  run MRIT a second time, 2 segments
 :  .
 :  .
 :  .
 :  run MRIT the Nth time, N segments (N  600 in  this case)
 : 
 :  So running MRIT N times results in N segments on the Solr node since
 :  merge _indexes_ doesn't trigger _segment_ merging AFAIK.
 : 
 :  This has been masked in the past I'd guess because subsequent
 :  regular indexing via SolrJ, post.jar, whatever _does_ then trigger
 :  segment merging. But we haven't seen the situation reported before
 :  where the _only_ way the index gets updated is via index merging.
 :  Index merging is done via MRIT in this case although this has nothing
 :  to do with MRIT and everything to do with the core admin mergeindexes
 :  command. MRIT is only relevant here since it's pretty much the first
 :  tool that conveniently allowed the only updates to be via
 :  mergeindexes.
 : 
 :  I reproduced this locally without MRIT by just taking a stock Solr,
 :  copying the index somewhere else, setting mergeFactor=2 then merging
 :  (and committing) again and again. Stopped at 15 segments or so. Then
 :  sent a couple of updates up via cURL and the segment count dropped
 :  back to 2..
 : 
 :  Whether the right place to fix this is Solr core Admin API
 :  MERGEINDEXES or in the lower-level Lucene call I don't have a strong
 :  opinion about.
 : 
 :  Of course one work-around is to periodically issue an optimize even
 :  though Uwe cringes every time that gets mentioned ;)
 : 
 :  On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com 
 wrote:
 :  I think you would probably want to control the number of segments with 
 the MapReduceIndexerTool before doing the merge initially, and if you find 
 you have too many segments over time as you add more and more data, use a 
 force merge call to reduce the number segments, either manually or scheduled.
 : 
 :  --
 :  Mark Miller
 :  about.me/markrmiller
 : 
 :  On July 11, 2014 at 4:36:22 PM, Erick Erickson 
 (erickerick...@gmail.com) wrote:
 :  I think I've become aware of an edge case that I'm wondering is worth
 :  a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of
 :  indexes and add them one by one to the running Solr node via merge
 :  indexes. The mergeFactor appears to be ignored in this scenario.
 :  Indeed, I suspect (without proof) that the entire merge policy is
 :  never referenced at all.
 : 
 :  Historically this hasn't mattered, since merging indexes was
 :  1 a rare operation
 :  2 the merge policy _does_ kick in when the index has more documents
 :  added to it via the normal (not merge indexes) policy so things would
 :  be cleaned up.
 : 
 :  All that said, the mapReduceIndexerTool is a scenario where we may be
 :  merging multiple times without every indexing documents any other way.
 :  Seems like the core admin API should trigger the merge policy logic
 :  somehow. The problem here is that the number of segments can grow
 :  without bound.
 : 
 :  Worth a JIRA?
 : 
 :  Erick
 : 
 :  -
 :  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 :  For additional commands, e-mail: dev-h...@lucene.apache.org
 : 
 : 
 : 
 : 
 :  -
 :  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 :  

[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059455#comment-14059455
 ] 

Robert Muir commented on LUCENE-5672:
-

FYI: this is the third time i've heard of this trap hitting people and creating 
hundreds or thousands of index segments: once was from coworkers at a past job, 
twice was lucene user list discussion Merger performance degradation on 
3.6.1, thrice was Erick's recent ML post. 

For people that don't want merging they have NoMergePolicy, maybeMerge() is 
even documented as expert and Explicit calls to maybeMerge() are usually not 
necessary. The most common case is when merge policy parameters have changed. 
So requiring the user to manually invoke this after index operations to prevent 
segment explosion is wrong IMO.

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5672.patch


 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2014-07-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2894:
---

Attachment: SOLR-2894.patch

bq. Quick note on PivotFacetHelper's retrieve method ...

I haven't really been aware of those other issues until now (although SOLR-3583 
may explain some of the unused code i pruned from PivotListEntry a few patches 
ago) but i agree with your assessment: if/when enhancements to distributed 
pivots start dealing with adding optional data to each level of the pivot, the 
appraoch currently used will have to change.

(Personally: I'm not emotionally ready to put any serious thought into that 
level of implementation detail in future pivot improvements - i want to focus 
on getting the basics of distrib pivots solid  released first)



Updated patch with most of the tests i had in mind that i mentioned before 
(although i'd still like to add some more facet.missing tests)...

* TestCloudPivotFacet
** randomize overrequest amounts
** randomize facet.mincount usage  assert never exceded
** randomize facet.missing usage  assert that null values are only ever last 
in list of values
*** make the odds of docs missing a field more randomized (across test runs)
** add in the possibility of trying to pivot on a field that is in 0 docs
** Dial back some constants to reduce OOM risk when running -Dtests.nightly=true
** example refine count failure from the facet.missing problem (unless there's 
another bug that looks really similar) with these changes: 
*** {{ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch 
-Dtests.seed=98C12D5256897A09 -Dtests.nightly=true -Dtests.slow=true 
-Dtests.locale=sr -Dtests.timezone=America/Louisville 
-Dtests.file.encoding=UTF-8}}

* DistributedFacetPivotLongTailTest
** some data tweaks  an additional assertion to ensure refinement is happening

* DistributedFacetPivotSmallTest
** s/honda/null/g - help test that the 4 character string null isn't 
triggering any special behavior, or getting confused with a missing value in 
docs.

* DistributedFacetPivotLargeTest
** comment  assert noting that a shard is left empty (helps with edge case 
testing of result merging  refinement)
** added assertPivot helper method  did a bit of refactoring
** added test of 2 diff pivots in the same request (swap field order)
** added test of same bi-level pivot with  w/o a tagged fq exclusion in the 
same request
** added test variants of facet.limit  facet.index used as localparam 
*** currently commented out because it doesn't work -- see SOLR-6193



The problem noted above with using {{facet.*}} params as local params in 
{{facet.pivot}} is something i discovered earlier this week while writing up 
these tests.  I initially set the problem set it asside to keep working on 
tests, with hte intention of looking into a fix once i had better coverage of 
the problem -- but then when i came back to revisit it yesterdan and looked to 
the existing {{facet.field}} shard request logic for guidance, i discovered 
that didn't seem to work the way i expected either and realized John Gibson 
recently filed SOLR-6193 because {{facet.field}} _does_ have the exact same 
problem.

i don't think we should let this block adding distributed facet.pivot, let's 
tackle it holisticly for all faceting in SOLR-6193.



Andrew/Brett: have you guys had a chance to look into the refinement bug when 
{{facet.missing}} is used?

(BTW: my update patch only affected test files, so hopefully theres no 
collision with anything you guys have been working on -- but if there is, feel 
free to just post whatever patch you guys come up with and i'll handle the 
merge)




 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Hoss Man
 Fix For: 4.9, 5.0

 Attachments: SOLR-2894-mincount-minification.patch, 
 SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894_cloud_test.patch, dateToObject.patch, 
 pivot_mincount_problem.sh


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by 

[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059530#comment-14059530
 ] 

Erick Erickson commented on LUCENE-5672:


Agreed, we _rely_ on segment merging in the normal state, to have it fail in 
this case is trappy.

Commit it I say.

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5672.patch


 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5746) solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema

2014-07-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059550#comment-14059550
 ] 

Hoss Man commented on SOLR-5746:


Hi Maciej,

I glance over the patch a bit more today:

1) can you explain why the need for the new 
{{DOMUtil.readNamedChildrenAsNamedList}} method that you added instead of just 
using the existing {{DOMUtil.childNodesToNamedList}} (which delegates to 
{{addToNamedList}} and immediately validates that element text conforms to the 
stated type).

I realize that using {{DOMUtil.childNodesToNamedList}} won't magically help 
parse/validate the config options in the backcompat cases like {{str 
name=shareSchematrue/str}} -- but that's where things like your 
{{storeConfigPropertyAsInt}} and {{storeConfigPropertyAsBoolean}} can be in 
charge of doing the cast if the raw value is still a string.

(i want to make sure we aren't introducing a redundant method in {{DOMUtil}}.

2) Speaking of which: what's the purpose exactly of configAsSolrParams if the 
original NamedList is still being passed to the {{storeConfigPropertyAs*}} 
methods - why not just get the values directly from there?

3) One piece of validation that i believe is still missing here is to throw an 
errir if/when a config value is specified multiple times -- I i remember the 
behavior of NamedList correctly, i think the way you have things now it will 
silently just use the first one, and then remove both.  We should definitely 
have an error check that any of these single valued config options is in fact 
only specified once in the NamedList -- so people don't try to add a setting 
they've read about in the docs w/o realizing it's already defined higher up in 
the file (we've seen that happen with settings in solrconfig.xml many times 
between we locked that down and made it an error case)



 solr.xml parsing of str vs int vs bool is brittle; fails silently; 
 expects odd type for shareSchema   
 --

 Key: SOLR-5746
 URL: https://issues.apache.org/jira/browse/SOLR-5746
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5, 4.6
Reporter: Hoss Man
 Attachments: SOLR-5746.patch, SOLR-5746.patch


 A comment in the ref guide got me looking at ConfigSolrXml.java and noticing 
 that the parsing of solr.xml options here is very brittle and confusing.  In 
 particular:
 * if a boolean option foo is expected along the lines of {{bool 
 name=footrue/bool}} it will silently ignore {{str 
 name=footrue/str}}
 * likewise for an int option {{int name=bar32/int}} vs {{str 
 name=bar32/str}}
 ... this is inconsistent with the way solrconfig.xml is parsed.  In 
 solrconfig.xml, the xml nodes are parsed into a NamedList, and the above 
 options will work in either form, but an invalid value such as {{bool 
 name=fooNOT A BOOLEAN/bool}} will generate an error earlier (when 
 parsing config) then {{str name=fooNOT A BOOLEAN/str}} (attempt to 
 parse the string as a bool the first time the config value is needed)
 In addition, i notice this really confusing line...
 {code}
 propMap.put(CfgProp.SOLR_SHARESCHEMA, 
 doSub(solr/str[@name='shareSchema']));
 {code}
 shareSchema is used internally as a boolean option, but as written the 
 parsing code will ignore it unless the user explicitly configures it as a 
 {{str/}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1706 - Still Failing!

2014-07-11 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1706/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexSplitter.test

Error Message:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/test/J0/temp/lucene.index.TestIndexSplitter-C26121833CA2DB9E-001/TestIndexSplitter-001/_2.si

Stack Trace:
java.nio.file.NoSuchFileException: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/test/J0/temp/lucene.index.TestIndexSplitter-C26121833CA2DB9E-001/TestIndexSplitter-001/_2.si
at 
__randomizedtesting.SeedInfo.seed([C26121833CA2DB9E:4A351E59925EB666]:0)
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196)
at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:106)
at 
org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:447)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:787)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:633)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:443)
at org.apache.lucene.index.IndexSplitter.init(IndexSplitter.java:95)
at 
org.apache.lucene.index.TestIndexSplitter.test(TestIndexSplitter.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge

2014-07-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059621#comment-14059621
 ] 

David Smiley commented on LUCENE-5672:
--

bq. Commit it I say.

+1; this is a bug.

 Addindexes does not call maybeMerge
 ---

 Key: LUCENE-5672
 URL: https://issues.apache.org/jira/browse/LUCENE-5672
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-5672.patch


 I don't know why this was removed, but this is buggy and just asking for 
 trouble.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60) - Build # 10673 - Failure!

2014-07-11 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10673/
Java: 32bit/jdk1.7.0_60 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestBinaryDocValuesUpdates.testManyReopensAndFields

Error Message:
MockDirectoryWrapper: cannot close: there are still open files: {_r.fdt=1, 
_r_MockVariableIntBlock_0.tib=1, _r_1_Asserting_0.dvd=1, _r_2_Memory_0.mdvd=1, 
_r_Asserting_0.dvd=1, _r_Lucene49_0.dvd=1, _r_MockVariableIntBlock_0.doc=1, 
_r_MockVariableIntBlock_0.skp=1, _r_SimpleText_0.dat=1, _r_Memory_0.mdvd=1}

Stack Trace:
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 
open files: {_r.fdt=1, _r_MockVariableIntBlock_0.tib=1, _r_1_Asserting_0.dvd=1, 
_r_2_Memory_0.mdvd=1, _r_Asserting_0.dvd=1, _r_Lucene49_0.dvd=1, 
_r_MockVariableIntBlock_0.doc=1, _r_MockVariableIntBlock_0.skp=1, 
_r_SimpleText_0.dat=1, _r_Memory_0.mdvd=1}
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:669)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:77)
at 
org.apache.lucene.index.TestBinaryDocValuesUpdates.testManyReopensAndFields(TestBinaryDocValuesUpdates.java:741)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at