[jira] [Updated] (SOLR-5746) solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema
[ https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Zasada updated SOLR-5746: Attachment: SOLR-5746.patch Hi [~hossman], I've attached updated patch file: * used framework's randomisation wherever it made sense to me; * added exception messages assertions; * added reporting of multiple unexpended config options ({{DEBUG}} level), as well as exception message containing list of unknown parameters (e.g. {code}Unrecognised 3 config parameter(s) in solr.xml file: [foo, bar, baz]{code} {{ant clean test}} shows that there's no regression. Cheers, Maciej solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema -- Key: SOLR-5746 URL: https://issues.apache.org/jira/browse/SOLR-5746 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5, 4.6 Reporter: Hoss Man Attachments: SOLR-5746.patch, SOLR-5746.patch A comment in the ref guide got me looking at ConfigSolrXml.java and noticing that the parsing of solr.xml options here is very brittle and confusing. In particular: * if a boolean option foo is expected along the lines of {{bool name=footrue/bool}} it will silently ignore {{str name=footrue/str}} * likewise for an int option {{int name=bar32/int}} vs {{str name=bar32/str}} ... this is inconsistent with the way solrconfig.xml is parsed. In solrconfig.xml, the xml nodes are parsed into a NamedList, and the above options will work in either form, but an invalid value such as {{bool name=fooNOT A BOOLEAN/bool}} will generate an error earlier (when parsing config) then {{str name=fooNOT A BOOLEAN/str}} (attempt to parse the string as a bool the first time the config value is needed) In addition, i notice this really confusing line... {code} propMap.put(CfgProp.SOLR_SHARESCHEMA, doSub(solr/str[@name='shareSchema'])); {code} shareSchema is used internally as a boolean option, but as written the parsing code will ignore it unless the user explicitly configures it as a {{str/}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6238) Specialized test case for leader recovery scenario
[ https://issues.apache.org/jira/browse/SOLR-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058566#comment-14058566 ] Shalin Shekhar Mangar commented on SOLR-6238: - More tests are always welcome. The leader initiated recovery doesn't actually cover this particular fail so I'm surprised that it doesn't reproduce after 4.7. Please help me understand the sequence of operations: {quote} Leader - Lost Connection with ZK Replica - Became leader {quote} If the leader lost its connection with ZK then it should've rejoined election on reconnect. If so, why was an add request on this (old) leader successful? I'll take a look at your patch. Specialized test case for leader recovery scenario -- Key: SOLR-6238 URL: https://issues.apache.org/jira/browse/SOLR-6238 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Fix For: 4.10 Attachments: SOLR-6238.patch A scenario which could happen at least before the addition of LeaderInitiatedRecoveryThread I think. Also this can happen only if one is using a non cloud aware client ( which might be quite a few users ) given that we have only SolrJ Events are in chronological order - Leader - Lost Connection with ZK Replica - Became leader Leader - add document is successful. Forwards it to the replica Replica - add document is unsuccessful as it is the leader and the request says it is coming from a leader So as of now the the Replica(new leader) won't have the doc but the leader(old leader) will have the document. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6238) Specialized test case for leader recovery scenario
[ https://issues.apache.org/jira/browse/SOLR-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-6238: --- Assignee: Shalin Shekhar Mangar Specialized test case for leader recovery scenario -- Key: SOLR-6238 URL: https://issues.apache.org/jira/browse/SOLR-6238 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.10 Attachments: SOLR-6238.patch A scenario which could happen at least before the addition of LeaderInitiatedRecoveryThread I think. Also this can happen only if one is using a non cloud aware client ( which might be quite a few users ) given that we have only SolrJ Events are in chronological order - Leader - Lost Connection with ZK Replica - Became leader Leader - add document is successful. Forwards it to the replica Replica - add document is unsuccessful as it is the leader and the request says it is coming from a leader So as of now the the Replica(new leader) won't have the doc but the leader(old leader) will have the document. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6234) Scoring modes for query time join
[ https://issues.apache.org/jira/browse/SOLR-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-6234: --- Description: it adds {{scorejoin}} query parser which calls Lucene's JoinUtil underneath. It supports: - {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil), also - supports {{b=100}} param to pass {{Query.setBoost()}}. So far - it always passes {{multipleValuesPerDocument=true}} - it doesn't cover cross core join case, I just can't find the multicore testcase in Solr test, I appreciate if you point me on one. - I attach standalone plugin project, let me know if somebody interested, I convert it into the proper Solr codebase patch. Also please mention the blockers! Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. was: it adds {{scorejoin}} query parser which calls Lucene's JoinUtil underneath. It supports: - {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil), also - supports {{b=100}} param to pass {{Query.setBoost()}}. So far - it always passes {{multipleValuesPerDocument=true}} - it doesn't cover cross core join case, I just can't find the multicore testcase in Solr test, I appreciate if you point me on one. - I attach standalone plugin project, let me know if somebody interested, I convert it into the proper Solr codebase patch. Also please mention the blockers! Scoring modes for query time join -- Key: SOLR-6234 URL: https://issues.apache.org/jira/browse/SOLR-6234 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 5.0, 4.10 Reporter: Mikhail Khludnev Attachments: lucene-join-solr-query-parser-0.0.2.zip it adds {{scorejoin}} query parser which calls Lucene's JoinUtil underneath. It supports: - {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil), also - supports {{b=100}} param to pass {{Query.setBoost()}}. So far - it always passes {{multipleValuesPerDocument=true}} - it doesn't cover cross core join case, I just can't find the multicore testcase in Solr test, I appreciate if you point me on one. - I attach standalone plugin project, let me know if somebody interested, I convert it into the proper Solr codebase patch. Also please mention the blockers! Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery
Michael McCandless created LUCENE-5815: -- Summary: Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery Key: LUCENE-5815 URL: https://issues.apache.org/jira/browse/LUCENE-5815 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10 I created a new query, called TermAutomatonQuery, that's a proximity query to generalize MultiPhraseQuery/SpanNearQuery: it lets you construct an arbitrary automaton whose transitions are whole terms, and then find all documents that the automaton matches. This is different from a normal automaton whose transitions are usually bytes/characters within a term/s. So, if the automaton has just 1 transition, it's just an expensive TermQuery. If you have two transitions in sequence, it's a phrase query of two terms. You can express synonyms by using transitions that overlap one another but the automaton doesn't have to be a sausage (as MultiPhraseQuery requires) i.e. it respects posLength (at query time). It also allows any transitions, to match any term, so you can do sloppy matching and span-like queries, e.g. find lucene and python with up to 3 other terms in between. I also added a class to convert a TokenStream directly to the automaton for this query, preserving posLength. (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). But if you do query-time-only synonyms then the matching should finally be correct. I haven't tested performance but I suspect it's quite slowish ... its cost is O(sum-totalTF) of all terms used in the automaton. There are some optimizations we could do, e.g. detecting that some terms in the automaton can be upgraded to MUST (right now they are all effectively SHOULD). I'm not sure how it should assign scores (punted on that for now), but the matching seems to be working. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5815: --- Attachment: LUCENE-5815.patch Patch, work in progress, lots of nocommits... Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery -- Key: LUCENE-5815 URL: https://issues.apache.org/jira/browse/LUCENE-5815 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10 Attachments: LUCENE-5815.patch I created a new query, called TermAutomatonQuery, that's a proximity query to generalize MultiPhraseQuery/SpanNearQuery: it lets you construct an arbitrary automaton whose transitions are whole terms, and then find all documents that the automaton matches. This is different from a normal automaton whose transitions are usually bytes/characters within a term/s. So, if the automaton has just 1 transition, it's just an expensive TermQuery. If you have two transitions in sequence, it's a phrase query of two terms. You can express synonyms by using transitions that overlap one another but the automaton doesn't have to be a sausage (as MultiPhraseQuery requires) i.e. it respects posLength (at query time). It also allows any transitions, to match any term, so you can do sloppy matching and span-like queries, e.g. find lucene and python with up to 3 other terms in between. I also added a class to convert a TokenStream directly to the automaton for this query, preserving posLength. (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). But if you do query-time-only synonyms then the matching should finally be correct. I haven't tested performance but I suspect it's quite slowish ... its cost is O(sum-totalTF) of all terms used in the automaton. There are some optimizations we could do, e.g. detecting that some terms in the automaton can be upgraded to MUST (right now they are all effectively SHOULD). I'm not sure how it should assign scores (punted on that for now), but the matching seems to be working. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6239) HttpSolrServer: connection still allocated
Sergio Fernández created SOLR-6239: -- Summary: HttpSolrServer: connection still allocated Key: SOLR-6239 URL: https://issues.apache.org/jira/browse/SOLR-6239 Project: Solr Issue Type: Bug Components: clients - java Reporter: Sergio Fernández Priority: Minor In scenarios where concurrency is aggressive, this exception could easily appear: {quote} org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid use of BasicClientConnManager: connection still allocated. Make sure to release the connection before allocating another one. at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] {quote} I wonder if there is any solution for it? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)
[ https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058716#comment-14058716 ] Jack Krupansky commented on SOLR-247: - The earlier commentary clearly lays out that the primary concern is that it would be a performance nightmare, but... that does depend on your particular use case. Personally, I would say to go forward with adding this feature, but with a clear documentation caveat that this feature should be use with great care since it is likely to be extremely memory and performance intensive and more of a development testing tool than a production feature, although it could have value when wildcard patterns are crafted with care for a very limited number of fields. Allow facet.field=* to facet on all fields (without knowing what they are) -- Key: SOLR-247 URL: https://issues.apache.org/jira/browse/SOLR-247 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Labels: beginners, newdev Attachments: SOLR-247-FacetAllFields.patch, SOLR-247.patch, SOLR-247.patch, SOLR-247.patch I don't know if this is a good idea to include -- it is potentially a bad idea to use it, but that can be ok. This came out of trying to use faceting for the LukeRequestHandler top term collecting. http://www.nabble.com/Luke-request-handler-issue-tf3762155.html -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6239) HttpSolrServer: connection still allocated
[ https://issues.apache.org/jira/browse/SOLR-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058724#comment-14058724 ] Sergio Fernández commented on SOLR-6239: The recommended way, by Httpcomponents 4.1, is to close connection and release any underlying resources: {code}EntityUtils.consume(HttpEntity){code} But not sure how it fits with the current code... HttpSolrServer: connection still allocated -- Key: SOLR-6239 URL: https://issues.apache.org/jira/browse/SOLR-6239 Project: Solr Issue Type: Bug Components: clients - java Reporter: Sergio Fernández Priority: Minor In scenarios where concurrency is aggressive, this exception could easily appear: {quote} org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid use of BasicClientConnManager: connection still allocated. Make sure to release the connection before allocating another one. at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) ~[solr-solrj-4.9.0.jar:4.9.0 1604085 - rmuir - 2014-06-20 06:34:04] {quote} I wonder if there is any solution for it? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058730#comment-14058730 ] Robert Muir commented on LUCENE-5815: - {quote} (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). {quote} Why not? Cant we just have a tokenfilter that encodes positionLengthAttribute as a vInt payload (will always be one byte, unless you are crazy)? The custom scorer here could optionally support it. Personally: not sure its worth it. I think its better to fix QP to parse correctly in common cases like word-delimiter etc (first: those tokenfilters must be fixed!). And I'm a little confused if this approach is faster than rewrite() to booleans of phrase queries? Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery -- Key: LUCENE-5815 URL: https://issues.apache.org/jira/browse/LUCENE-5815 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10 Attachments: LUCENE-5815.patch I created a new query, called TermAutomatonQuery, that's a proximity query to generalize MultiPhraseQuery/SpanNearQuery: it lets you construct an arbitrary automaton whose transitions are whole terms, and then find all documents that the automaton matches. This is different from a normal automaton whose transitions are usually bytes/characters within a term/s. So, if the automaton has just 1 transition, it's just an expensive TermQuery. If you have two transitions in sequence, it's a phrase query of two terms. You can express synonyms by using transitions that overlap one another but the automaton doesn't have to be a sausage (as MultiPhraseQuery requires) i.e. it respects posLength (at query time). It also allows any transitions, to match any term, so you can do sloppy matching and span-like queries, e.g. find lucene and python with up to 3 other terms in between. I also added a class to convert a TokenStream directly to the automaton for this query, preserving posLength. (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). But if you do query-time-only synonyms then the matching should finally be correct. I haven't tested performance but I suspect it's quite slowish ... its cost is O(sum-totalTF) of all terms used in the automaton. There are some optimizations we could do, e.g. detecting that some terms in the automaton can be upgraded to MUST (right now they are all effectively SHOULD). I'm not sure how it should assign scores (punted on that for now), but the matching seems to be working. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058749#comment-14058749 ] Michael McCandless commented on LUCENE-5815: bq. Cant we just have a tokenfilter that encodes positionLengthAttribute as a vInt payload (will always be one byte, unless you are crazy)? The custom scorer here could optionally support it. Yes, I think that would work! And would be pretty simple to build... and the changes to this scorer would be simple: right now it just hardwires that a given token goes from pos to pos+1, but with this vInt in the payload it would decode that and use it instead of +1. bq. I think its better to fix QP to parse correctly in common cases like word-delimiter etc (first: those tokenfilters must be fixed!). Right, QP needs to use posLength to build the correct queries... this new query just makes it easy since any arbitrary graph TokenStream can be directly translated into the equivalent query. bq. And I'm a little confused if this approach is faster than rewrite() to booleans of phrase queries? We can only rewrite to BQ of PQ if the automaton doesn't use the ANY token transition, and if it's finite, right? Or maybe we could somehow take ANY and map it to slop on the phrase queries? But in those restricted cases, it's probably faster, I guess depending on what the automaton looks like. Ie, you could make a biggish automaton that rewrites to many many phrase queries. I'll add a TODO to maybe do this rewriting for this query ... Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery -- Key: LUCENE-5815 URL: https://issues.apache.org/jira/browse/LUCENE-5815 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10 Attachments: LUCENE-5815.patch I created a new query, called TermAutomatonQuery, that's a proximity query to generalize MultiPhraseQuery/SpanNearQuery: it lets you construct an arbitrary automaton whose transitions are whole terms, and then find all documents that the automaton matches. This is different from a normal automaton whose transitions are usually bytes/characters within a term/s. So, if the automaton has just 1 transition, it's just an expensive TermQuery. If you have two transitions in sequence, it's a phrase query of two terms. You can express synonyms by using transitions that overlap one another but the automaton doesn't have to be a sausage (as MultiPhraseQuery requires) i.e. it respects posLength (at query time). It also allows any transitions, to match any term, so you can do sloppy matching and span-like queries, e.g. find lucene and python with up to 3 other terms in between. I also added a class to convert a TokenStream directly to the automaton for this query, preserving posLength. (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). But if you do query-time-only synonyms then the matching should finally be correct. I haven't tested performance but I suspect it's quite slowish ... its cost is O(sum-totalTF) of all terms used in the automaton. There are some optimizations we could do, e.g. detecting that some terms in the automaton can be upgraded to MUST (right now they are all effectively SHOULD). I'm not sure how it should assign scores (punted on that for now), but the matching seems to be working. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058754#comment-14058754 ] Robert Muir commented on LUCENE-5815: - {quote} We can only rewrite to BQ of PQ if the automaton doesn't use the ANY token transition, and if it's finite, right? Or maybe we could somehow take ANY and map it to slop on the phrase queries? {quote} Hmm, ok i see what you are getting at. I guess I was immediately only considering the case where the automaton is actually coming from query analysis chain: it would always be finite and so on in this case... right? Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery -- Key: LUCENE-5815 URL: https://issues.apache.org/jira/browse/LUCENE-5815 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10 Attachments: LUCENE-5815.patch I created a new query, called TermAutomatonQuery, that's a proximity query to generalize MultiPhraseQuery/SpanNearQuery: it lets you construct an arbitrary automaton whose transitions are whole terms, and then find all documents that the automaton matches. This is different from a normal automaton whose transitions are usually bytes/characters within a term/s. So, if the automaton has just 1 transition, it's just an expensive TermQuery. If you have two transitions in sequence, it's a phrase query of two terms. You can express synonyms by using transitions that overlap one another but the automaton doesn't have to be a sausage (as MultiPhraseQuery requires) i.e. it respects posLength (at query time). It also allows any transitions, to match any term, so you can do sloppy matching and span-like queries, e.g. find lucene and python with up to 3 other terms in between. I also added a class to convert a TokenStream directly to the automaton for this query, preserving posLength. (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). But if you do query-time-only synonyms then the matching should finally be correct. I haven't tested performance but I suspect it's quite slowish ... its cost is O(sum-totalTF) of all terms used in the automaton. There are some optimizations we could do, e.g. detecting that some terms in the automaton can be upgraded to MUST (right now they are all effectively SHOULD). I'm not sure how it should assign scores (punted on that for now), but the matching seems to be working. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5815) Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058755#comment-14058755 ] Michael McCandless commented on LUCENE-5815: bq. I guess I was immediately only considering the case where the automaton is actually coming from query analysis chain: it would always be finite and so on in this case... right? Ahh, yes it would! We could always use BQ(PQ) in that case, and typically the Automaton would be smallish, unless app makes crazy synonyms, etc.? Add TermAutomatonQuery, for proximity matching that generalizes MultiPhraseQuery/SpanNearQuery -- Key: LUCENE-5815 URL: https://issues.apache.org/jira/browse/LUCENE-5815 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.10 Attachments: LUCENE-5815.patch I created a new query, called TermAutomatonQuery, that's a proximity query to generalize MultiPhraseQuery/SpanNearQuery: it lets you construct an arbitrary automaton whose transitions are whole terms, and then find all documents that the automaton matches. This is different from a normal automaton whose transitions are usually bytes/characters within a term/s. So, if the automaton has just 1 transition, it's just an expensive TermQuery. If you have two transitions in sequence, it's a phrase query of two terms. You can express synonyms by using transitions that overlap one another but the automaton doesn't have to be a sausage (as MultiPhraseQuery requires) i.e. it respects posLength (at query time). It also allows any transitions, to match any term, so you can do sloppy matching and span-like queries, e.g. find lucene and python with up to 3 other terms in between. I also added a class to convert a TokenStream directly to the automaton for this query, preserving posLength. (Of course, the index can't store posLength, so the matching won't be fully correct if any indexed tokens has posLength != 1). But if you do query-time-only synonyms then the matching should finally be correct. I haven't tested performance but I suspect it's quite slowish ... its cost is O(sum-totalTF) of all terms used in the automaton. There are some optimizations we could do, e.g. detecting that some terms in the automaton can be upgraded to MUST (right now they are all effectively SHOULD). I'm not sure how it should assign scores (punted on that for now), but the matching seems to be working. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5816) ToParentBlockJoin deothogonalization
Nikolay Khitrin created LUCENE-5816: --- Summary: ToParentBlockJoin deothogonalization Key: LUCENE-5816 URL: https://issues.apache.org/jira/browse/LUCENE-5816 Project: Lucene - Core Issue Type: Improvement Components: modules/join Affects Versions: 4.9 Reporter: Nikolay Khitrin For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal result. In few words: Document matching parent filter should be parent of itself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization
[ https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Khitrin updated LUCENE-5816: Summary: ToParentBlockJoinQuery deothogonalization (was: ToParentBlockJoin deothogonalization) ToParentBlockJoinQuery deothogonalization - Key: LUCENE-5816 URL: https://issues.apache.org/jira/browse/LUCENE-5816 Project: Lucene - Core Issue Type: Improvement Components: modules/join Affects Versions: 4.9 Reporter: Nikolay Khitrin For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal result. In few words: Document matching parent filter should be parent of itself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization
[ https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Khitrin updated LUCENE-5816: Attachment: LUCENE-5816.patch ToParentBlockJoinQuery deothogonalization - Key: LUCENE-5816 URL: https://issues.apache.org/jira/browse/LUCENE-5816 Project: Lucene - Core Issue Type: Improvement Components: modules/join Affects Versions: 4.9 Reporter: Nikolay Khitrin Attachments: LUCENE-5816.patch For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal result. In few words: Document matching parent filter should be parent of itself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization
[ https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Khitrin updated LUCENE-5816: Description: For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal queries. In few words: Document matching parent filter should be parent of itself. was: For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal result. In few words: Document matching parent filter should be parent of itself. ToParentBlockJoinQuery deothogonalization - Key: LUCENE-5816 URL: https://issues.apache.org/jira/browse/LUCENE-5816 Project: Lucene - Core Issue Type: Improvement Components: modules/join Affects Versions: 4.9 Reporter: Nikolay Khitrin Attachments: LUCENE-5816.patch For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal queries. In few words: Document matching parent filter should be parent of itself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization
[ https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Khitrin updated LUCENE-5816: Description: For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunately it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal queries. In few words: Document matching parent filter should be parent of itself. was: For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunatelly it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal queries. In few words: Document matching parent filter should be parent of itself. ToParentBlockJoinQuery deothogonalization - Key: LUCENE-5816 URL: https://issues.apache.org/jira/browse/LUCENE-5816 Project: Lucene - Core Issue Type: Improvement Components: modules/join Affects Versions: 4.9 Reporter: Nikolay Khitrin Attachments: LUCENE-5816.patch For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunately it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal queries. In few words: Document matching parent filter should be parent of itself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6238) Specialized test case for leader recovery scenario
[ https://issues.apache.org/jira/browse/SOLR-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058786#comment-14058786 ] Mark Miller commented on SOLR-6238: --- bq. If the leader lost its connection with ZK then it should've rejoined election on reconnect. If so, why was an add request on this (old) leader successful? The only thing I can reason so far is Leader - doc gets past zk check Leader - Lost Connection with ZK Replica - Became leader Leader (old) - add document is successful. Forwards it to the replica Replica - add document is unsuccessful as it is the leader and the request says it is coming from a leader Leader (old) - reconnects to ZK, peer syncs with Replica and succeeds because it's not behind. Specialized test case for leader recovery scenario -- Key: SOLR-6238 URL: https://issues.apache.org/jira/browse/SOLR-6238 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.10 Attachments: SOLR-6238.patch A scenario which could happen at least before the addition of LeaderInitiatedRecoveryThread I think. Also this can happen only if one is using a non cloud aware client ( which might be quite a few users ) given that we have only SolrJ Events are in chronological order - Leader - Lost Connection with ZK Replica - Became leader Leader - add document is successful. Forwards it to the replica Replica - add document is unsuccessful as it is the leader and the request says it is coming from a leader So as of now the the Replica(new leader) won't have the doc but the leader(old leader) will have the document. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058817#comment-14058817 ] Noble Paul commented on SOLR-5473: -- bq.There should at least be an option to listen to collections of choice so that you don't have to fetch them for each request Are you talking about Solr nodes? watching nodes is error prone unless we have clear rules on how to watch/unwatch . The strategy should be similar to SolrJ where it watches nothing but caches everything SolrDispatchFIlter will be enhanced to use caching similar to CLoudSolrServer Split clusterstate.json per collection and watch states selectively Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5817) hunspell buggy zero-affix handling
[ https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5817: Attachment: LUCENE-5817.patch patch with a simple test hunspell buggy zero-affix handling -- Key: LUCENE-5817 URL: https://issues.apache.org/jira/browse/LUCENE-5817 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5817.patch This only partially works today. But zero-affixes are used heavily by many dictionaries (e.g. i found a good number of bugs in czech and latvian just experimenting). The fix is easy: we just have to look for 0 in the affix portion as well as the strip portion, as indicated by the manual page: Zero stripping or affix are indicated by zero. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5817) hunspell buggy zero-affix handling
Robert Muir created LUCENE-5817: --- Summary: hunspell buggy zero-affix handling Key: LUCENE-5817 URL: https://issues.apache.org/jira/browse/LUCENE-5817 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5817.patch This only partially works today. But zero-affixes are used heavily by many dictionaries (e.g. i found a good number of bugs in czech and latvian just experimenting). The fix is easy: we just have to look for 0 in the affix portion as well as the strip portion, as indicated by the manual page: Zero stripping or affix are indicated by zero. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5816) ToParentBlockJoinQuery deothogonalization
[ https://issues.apache.org/jira/browse/LUCENE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Khitrin updated LUCENE-5816: Attachment: LUCENE-5816.patch Removed validation test due to removing exception from TPBJQ. ToParentBlockJoinQuery deothogonalization - Key: LUCENE-5816 URL: https://issues.apache.org/jira/browse/LUCENE-5816 Project: Lucene - Core Issue Type: Improvement Components: modules/join Affects Versions: 4.9 Reporter: Nikolay Khitrin Attachments: LUCENE-5816.patch, LUCENE-5816.patch For now ToParentBlockJoinQuery accepts only child documents. Before (LUCENE-4968) passing parent document to TPBJQ lead to undefined behavior and garbage in results, unfortunately it also affects TPBJQ.advance(). After pointed patch IllegalStateException is thrown when this occurs. So we must always take parent-child relations into account while writing queries. At most of time it is necessary when writing a query, but sometimes, filters can be independent of data model (for example, ACL filters: +TPBJQ +allowed:user). TPBJQ shall returns parent doc if parent doc is passed to TPBJQ.advance() or returned from childScorer.advance(). This change doesn't break anything: results will be absolutely the same for parent-child orthogonal queries. In few words: Document matching parent filter should be parent of itself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated SOLR-6136: - Attachment: SOLR-6136.patch Here's a patch based largely on Brandon's original patch, using wait / notifyAll instead of the spin lock in blockUntilFinished. As mentioned above, VisualVM shows good evidence of this improvement in that the amount of CPU spent in the block method is negligible with this patch (and very noticeable without it). I've also included the first cut at a unit test for CUSS. There's probably more things we can do to exercise the logic in CUSS, so let me know if you have any other ideas for the unit test. Brandon - please try this patch out in your environment if possible. I'll plan to commit this to trunk and backport to 4x branch in a few days after keeping on eye on things in Jenkins. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Assignee: Timothy Potter Priority: Critical Attachments: SOLR-6136.patch, wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058858#comment-14058858 ] Timothy Potter commented on SOLR-6136: -- btw - I decided to not mess with the threadCount stuff Mark and I were discussing here as that should be handled under another improvement ticket after doing some benchmarking to show if it even helps. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Assignee: Timothy Potter Priority: Critical Attachments: SOLR-6136.patch, wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5817) hunspell buggy zero-affix handling
[ https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058867#comment-14058867 ] ASF subversion and git services commented on LUCENE-5817: - Commit 1609723 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1609723 ] LUCENE-5817: Fix hunspell zero-affix handling hunspell buggy zero-affix handling -- Key: LUCENE-5817 URL: https://issues.apache.org/jira/browse/LUCENE-5817 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5817.patch This only partially works today. But zero-affixes are used heavily by many dictionaries (e.g. i found a good number of bugs in czech and latvian just experimenting). The fix is easy: we just have to look for 0 in the affix portion as well as the strip portion, as indicated by the manual page: Zero stripping or affix are indicated by zero. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5817) hunspell buggy zero-affix handling
[ https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058874#comment-14058874 ] ASF subversion and git services commented on LUCENE-5817: - Commit 1609725 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1609725 ] LUCENE-5817: Fix hunspell zero-affix handling hunspell buggy zero-affix handling -- Key: LUCENE-5817 URL: https://issues.apache.org/jira/browse/LUCENE-5817 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5817.patch This only partially works today. But zero-affixes are used heavily by many dictionaries (e.g. i found a good number of bugs in czech and latvian just experimenting). The fix is easy: we just have to look for 0 in the affix portion as well as the strip portion, as indicated by the manual page: Zero stripping or affix are indicated by zero. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5817) hunspell buggy zero-affix handling
[ https://issues.apache.org/jira/browse/LUCENE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5817. - Resolution: Fixed Fix Version/s: 4.10 5.0 hunspell buggy zero-affix handling -- Key: LUCENE-5817 URL: https://issues.apache.org/jira/browse/LUCENE-5817 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, 4.10 Attachments: LUCENE-5817.patch This only partially works today. But zero-affixes are used heavily by many dictionaries (e.g. i found a good number of bugs in czech and latvian just experimenting). The fix is easy: we just have to look for 0 in the affix portion as well as the strip portion, as indicated by the manual page: Zero stripping or affix are indicated by zero. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.
[ https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058879#comment-14058879 ] Mark Miller commented on SOLR-5495: --- I did a quick review of the code and read your comments above more thoroughly. I did not do a low level review. From that mid level though, this looks like a great change and even if there are any issues, the changes look like good improvements and we should just work through anything that comes up as a result of them. As I work on anything in that area, I'll look at some parts more closely. Recovery strategy for leader partitioned from replica case. --- Key: SOLR-5495 URL: https://issues.apache.org/jira/browse/SOLR-5495 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Timothy Potter Fix For: 4.9 Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch We need to work out a strategy for the case of: Leader and replicas can still talk to ZooKeeper, Leader cannot talk to replica. We punted on this in initial design, but I'd like to get something in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058880#comment-14058880 ] Ramkumar Aiyengar commented on SOLR-5473: - Got it. I wouldn't say I am still convinced that caching till you fail is the same as watching. There are still cases around (unfortunately not very reproduceable enough to be fixed) the cluster state tells you for sure that some node is off limits, but the actual request doesn't fail fast enough. More generally, ideally I would like to be able to inform that for some reason a node shouldn't be used for whatever environmental reason even though it physically is up (may be it is doing something weird and I would like for it to be up and available for debugging while not affecting queries). Currently that's not possible, but something we might work to get added. It would be good to have the true ZK state instead of lazily updating it on error. bq. watching nodes is error prone unless we have clear rules on how to watch/unwatch That's why I am saying that at least in the simplistic case this should be left to configuration -- watch none, all, or selected. That would at least open the doors for more sophisticated logic on making the selection smarter, but we shouldn't shut it out and require only caching to be used. Use the watch when you have been told to, else cache.. Split clusterstate.json per collection and watch states selectively Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3617) Consider adding start scripts.
[ https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058889#comment-14058889 ] Varun Thacker commented on SOLR-3617: - bq. Also, I've built-in Shawn H's famous GC tuning options for Solr, which I've found to be good for many Solr workflows. In general, I think the start script should take as much of the thinking out of running Solr as possible, which includes baking in best practices. +1 for baking in the best practices. Should we have warnings for say - too less file handles, using a buggy java version etc? In one of Marks earlier comments he had mentioned that we could have a start-dev and a start-prod. These warnings would make sense in the start-prod script. Not sure if it's a good idea to have them if we have only one start script. Consider adding start scripts. -- Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller Attachments: SOLR-3617.patch I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.
[ https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1405#comment-1405 ] Timothy Potter commented on SOLR-5495: -- Hi Mark, Awesome, thanks for the review ... there's one area in the CoreAdminHandler waitForState that could use your review. // TODO: This is funky but I've seen this in testing where the replica asks the // leader to be in recovery? Need to track down how that happens ... in the meantime, // this is a safeguard boolean leaderDoesNotNeedRecovery = (onlyIfLeader != null onlyIfLeader core.getName().equals(nodeProps.getStr(core)) ZkStateReader.RECOVERING.equals(waitForState) ZkStateReader.ACTIVE.equals(localState) ZkStateReader.ACTIVE.equals(state)); Basically, at some point, I was seeing replicas ask active leaders to recover, which I didn't think was a valid thing to do. I actually haven't seen this occur in any of my testing so maybe I was just confused. We can definitely remove that code if it's not valid, but wanted to make you aware that I had it in there ;-) Recovery strategy for leader partitioned from replica case. --- Key: SOLR-5495 URL: https://issues.apache.org/jira/browse/SOLR-5495 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Timothy Potter Fix For: 4.9 Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch We need to work out a strategy for the case of: Leader and replicas can still talk to ZooKeeper, Leader cannot talk to replica. We punted on this in initial design, but I'd like to get something in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058910#comment-14058910 ] Mark Miller commented on SOLR-6136: --- {noformat} -final UpdateRequest updateRequest = queue.poll(250, -TimeUnit.MILLISECONDS); +final UpdateRequest updateRequest = +queue.poll(pollQueueTime, TimeUnit.MILLISECONDS); if (updateRequest == null) {noformat} Know when that bug was introduced? If it went out in 4.9, that is a pretty severe performance bug if you are not streaming or batching big. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Assignee: Timothy Potter Priority: Critical Attachments: SOLR-6136.patch, wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5818) Fix hunspell zero-string overgeneration
Robert Muir created LUCENE-5818: --- Summary: Fix hunspell zero-string overgeneration Key: LUCENE-5818 URL: https://issues.apache.org/jira/browse/LUCENE-5818 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage). Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5818) Fix hunspell zero-string overgeneration
[ https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5818: Attachment: LUCENE-5818.patch Simple patch with some tests. This might be a bug i introduced when cutting over to FST, because we had no test for it before. Fix hunspell zero-string overgeneration --- Key: LUCENE-5818 URL: https://issues.apache.org/jira/browse/LUCENE-5818 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5818.patch Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage). Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058916#comment-14058916 ] Timothy Potter commented on SOLR-6136: -- not sure about that one ... just something I caught while working on this issue ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Assignee: Timothy Potter Priority: Critical Attachments: SOLR-6136.patch, wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3365) Data import using local time to mark last_index_time
[ https://issues.apache.org/jira/browse/SOLR-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichiro Abe updated SOLR-3365: - Attachment: SOLR-3365.patch Simple patch for trunk. It would be nice if we could configure time zone when database server's timezone is differed from Solr server's one. Because currently we have to add '-Duser.timezone=foobar' when starting Solr as a workaround. e.g. {code:xml} propertyWriter type=SimplePropertiesWriter timezone=Asia/Tokyo / Or propertyWriter type=SimplePropertiesWriter timezone=Etc/GMT-9 / Or propertyWriter type=SimplePropertiesWriter timezone=Etc/GMT / {code} Data import using local time to mark last_index_time Key: SOLR-3365 URL: https://issues.apache.org/jira/browse/SOLR-3365 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Environment: 1 mysql data source server 2 solr servers Reporter: Bartosz Cembor Attachments: SOLR-3365.patch Class org.apache.solr.handler.dataimport.DataImporter setIndexStartTime(new Date()); When there is difference in time beetwen servers (mysql and solr) some documents are not indexed I think DataImporter should take time from mysql database (SELECT NOW()) and use it for mark start_index_time -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #656: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/656/ 2 tests failed. FAILED: org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup Error Message: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestReplicationHandlerBackup: 1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.URL.openStream(URL.java:1037) at org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestReplicationHandlerBackup: 1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.URL.openStream(URL.java:1037) at org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314) at __randomizedtesting.SeedInfo.seed([BDF25AD236E158BA]:0) FAILED: org.apache.solr.handler.TestReplicationHandlerBackup.org.apache.solr.handler.TestReplicationHandlerBackup Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.URL.openStream(URL.java:1037) at org.apache.solr.handler.TestReplicationHandlerBackup$BackupThread.run(TestReplicationHandlerBackup.java:314) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated: 1) Thread[id=8779, name=Thread-4102, state=RUNNABLE, group=TGRP-TestReplicationHandlerBackup] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at
[jira] [Resolved] (LUCENE-5818) Fix hunspell zero-string overgeneration
[ https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5818. - Resolution: Fixed Fix Version/s: 4.10 5.0 Fix hunspell zero-string overgeneration --- Key: LUCENE-5818 URL: https://issues.apache.org/jira/browse/LUCENE-5818 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, 4.10 Attachments: LUCENE-5818.patch Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage). Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5818) Fix hunspell zero-string overgeneration
[ https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059042#comment-14059042 ] Hoss Man commented on LUCENE-5818: -- For those keeping score at home... http://svn.apache.org/r1609738 http://svn.apache.org/r1609739 Fix hunspell zero-string overgeneration --- Key: LUCENE-5818 URL: https://issues.apache.org/jira/browse/LUCENE-5818 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, 4.10 Attachments: LUCENE-5818.patch Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage). Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059045#comment-14059045 ] Mark Miller commented on SOLR-5473: --- bq. I wouldn't say I am still convinced that caching till you fail is the same as watching. I don't believe it is either. I like it less where you don't need thousands of collections. Perhaps we consider making it an optional optimization on CloudSolrServer? Given the scaling gains for collections though, this issue overall does seem worth any tradeoffs and it seems like improvements and options can be made where appropriate. Split clusterstate.json per collection and watch states selectively Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6136) ConcurrentUpdateSolrServer includes a Spin Lock
[ https://issues.apache.org/jira/browse/SOLR-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059082#comment-14059082 ] Mark Miller commented on SOLR-6136: --- Ah, I see, it's a different poll call. I only had this take affect on one of the poll calls because that was enough to relieve the performance issue in my benchmarks. I think it makes sense to expand it to the this other poll call as well. +1 on the patch, looks okay to me, tests pass locally. I don't want to think about testing CUSS at the moment, but nice work on a new test for it. Will be great to have it grow - this has been a troublesome class to stabilize over the years. ConcurrentUpdateSolrServer includes a Spin Lock --- Key: SOLR-6136 URL: https://issues.apache.org/jira/browse/SOLR-6136 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1 Reporter: Brandon Chapman Assignee: Timothy Potter Priority: Critical Attachments: SOLR-6136.patch, wait___notify_all.patch ConcurrentUpdateSolrServer.blockUntilFinished() includes a Spin Lock. This causes an extremely high amount of CPU to be used on the Cloud Leader during indexing. Here is a summary of our system testing. Importing data on Solr4.5.0: Throughput gets as high as 240 documents per second. [tomcat@solr-stg01 logs]$ uptime 09:53:50 up 310 days, 23:52, 1 user, load average: 3.33, 3.72, 5.43 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9547 tomcat 21 0 6850m 1.2g 16m S 86.2 5.0 1:48.81 java Importing data on Solr4.7.0 with no replicas: Throughput peaks at 350 documents per second. [tomcat@solr-stg01 logs]$ uptime 10:03:44 up 311 days, 2 min, 1 user, load average: 4.57, 2.55, 4.18 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9728 tomcat 23 0 6859m 2.2g 28m S 62.3 9.0 2:20.20 java Importing data on Solr4.7.0 with replicas: Throughput peaks at 30 documents per second because the Solr machine is out of CPU. [tomcat@solr-stg01 logs]$ uptime 09:40:04 up 310 days, 23:38, 1 user, load average: 30.54, 12.39, 4.79 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9190 tomcat 17 0 7005m 397m 15m S 198.5 1.6 7:14.87 java -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6203) cast exception while searching with sort function and result grouping
[ https://issues.apache.org/jira/browse/SOLR-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059132#comment-14059132 ] Hoss Man commented on SOLR-6203: I haven't looked into this, but i remember similar issues were problematic with normal distributed sorting in older versions of Solr, but this should have largely been resolved by SOLR-5354 -- see in particular this comment subsequent commit... https://issues.apache.org/jira/browse/SOLR-5354?focusedCommentId=13835891page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13835891 https://svn.apache.org/r1547430 ..apparently we're still missing a place in the grouping code that should be looking at SortSpec.getSchemaFields() and isn't. cast exception while searching with sort function and result grouping - Key: SOLR-6203 URL: https://issues.apache.org/jira/browse/SOLR-6203 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7, 4.8 Reporter: Nathan Dire Attachments: SOLR-6203-unittest.patch After upgrading from 4.5.1 to 4.7+, a schema including a {{*}} dynamic field as text gets a cast exception when using a sort function and result grouping. Repro (with example config): # Add {{*}} dynamic field as a {{TextField}}, eg: {noformat} dynamicField name=* type=text_general multiValued=true / {noformat} # Create sharded collection {noformat} curl 'http://localhost:8983/solr/admin/collections?action=CREATEname=testnumShards=2maxShardsPerNode=2' {noformat} # Add example docs (query must have some results) # Submit query which sorts on a function result and uses result grouping: {noformat} { responseHeader: { status: 500, QTime: 50, params: { sort: sqrt(popularity) desc, indent: true, q: *:*, _: 1403709010008, group.field: manu, group: true, wt: json } }, error: { msg: java.lang.Double cannot be cast to org.apache.lucene.util.BytesRef, code: 500 } } {noformat} Source exception from log: {noformat} ERROR - 2014-06-25 08:10:10.055; org.apache.solr.common.SolrException; java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.lucene.util.BytesRef at org.apache.solr.schema.FieldType.marshalStringSortValue(FieldType.java:981) at org.apache.solr.schema.TextField.marshalSortValue(TextField.java:176) at org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.serializeSearchGroup(SearchGroupsResultTransformer.java:125) at org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:65) at org.apache.solr.search.grouping.distributed.shardresultserializer.SearchGroupsResultTransformer.transform(SearchGroupsResultTransformer.java:43) at org.apache.solr.search.grouping.CommandHandler.processResult(CommandHandler.java:193) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:340) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) ... {noformat} It looks like {{serializeSearchGroup}} is matching the sort expression as the {{*}} dynamic field, which is a TextField in the repro. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_05) - Build # 10669 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10669/ Java: 64bit/jdk1.8.0_05 -XX:-UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 58917 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:406: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:179: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic Total time: 79 minutes 35 seconds Build step 'Invoke Ant' marked build as failure [description-setter] Description set: Java: 64bit/jdk1.8.0_05 -XX:-UseCompressedOops -XX:+UseSerialGC Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6235) SyncSliceTest fails on jenkins with no live servers available error
[ https://issues.apache.org/jira/browse/SOLR-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059139#comment-14059139 ] Mark Miller commented on SOLR-6235: --- Right. I think a minimum, doing nothing is probably preferable in most cases. Perhaps a retry or two? Or perhaps we should look at sending to leaders to originate. We would still want to commit everywhere in parallel though, and I'm not sure we can do anything that is that much better. The current situation doesn't seem good though. SyncSliceTest fails on jenkins with no live servers available error --- Key: SOLR-6235 URL: https://issues.apache.org/jira/browse/SOLR-6235 Project: Solr Issue Type: Bug Components: SolrCloud, Tests Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.10 {code} 1 tests failed. FAILED: org.apache.solr.cloud.SyncSliceTest.testDistribSearch Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at __randomizedtesting.SeedInfo.seed([685C57B3F25C854B:E9BAD9AB8503E577]:0) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:317) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:659) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1149) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1118) at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:236) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865) {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0_05) - Build # 10669 - Still Failing!
Rob fixed. Mike McCandless http://blog.mikemccandless.com On Fri, Jul 11, 2014 at 2:20 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10669/ Java: 64bit/jdk1.8.0_05 -XX:-UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 58917 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:406: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:179: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic Total time: 79 minutes 35 seconds Build step 'Invoke Ant' marked build as failure [description-setter] Description set: Java: 64bit/jdk1.8.0_05 -XX:-UseCompressedOops -XX:+UseSerialGC Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_60) - Build # 10788 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10788/ Java: 64bit/jdk1.7.0_60 -XX:-UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 59435 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:467: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:406: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:87: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:179: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic Total time: 82 minutes 37 seconds Build step 'Invoke Ant' marked build as failure [description-setter] Description set: Java: 64bit/jdk1.7.0_60 -XX:-UseCompressedOops -XX:+UseSerialGC Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1705 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1705/ Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 58831 lines...] BUILD FAILED /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:467: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:406: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:87: The following error occurred while executing this line: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/extra-targets.xml:179: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.aff * ./lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/strange-overgeneration.dic Total time: 138 minutes 28 seconds Build step 'Invoke Ant' marked build as failure [description-setter] Description set: Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC Archiving artifacts Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Core admin merge indexes, should it trigger merge policy?
I think I've become aware of an edge case that I'm wondering is worth a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of indexes and add them one by one to the running Solr node via merge indexes. The mergeFactor appears to be ignored in this scenario. Indeed, I suspect (without proof) that the entire merge policy is never referenced at all. Historically this hasn't mattered, since merging indexes was 1 a rare operation 2 the merge policy _does_ kick in when the index has more documents added to it via the normal (not merge indexes) policy so things would be cleaned up. All that said, the mapReduceIndexerTool is a scenario where we may be merging multiple times without every indexing documents any other way. Seems like the core admin API should trigger the merge policy logic somehow. The problem here is that the number of segments can grow without bound. Worth a JIRA? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3617) Consider adding start scripts.
[ https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059346#comment-14059346 ] Timothy Potter commented on SOLR-3617: -- Hi Varun, Thanks for the feedback. Good idea about checking Java version but it's tough to know how many file handles is sufficient. Also, I'm favoring one script to rule them all ;-) Although I'm willing to be convinced of the start-dev / start-prod, I like bin/solr because it's simple and intuitive. In addition: 1) One script to maintain / document; well two actually scripts (one for *nix and other for Windows) 2) I'm turning off prod options when you enable an example using the -e flag 3) There's going to be a lot of overlap in the logic between the two scripts anyway I'm cooking up the Windows version today and have some updates to the *nix one, the main one being a restart option as described above as I convinced myself while writing the initial comment above that restart is a more standard approach. Another patch coming soon ... Consider adding start scripts. -- Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller Attachments: SOLR-3617.patch I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6193) using facet.* parameters as local params inside of facet.field causes problems in distributed search
[ https://issues.apache.org/jira/browse/SOLR-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-6193: --- Description: The distributed request logic for faceting (which has to clonemodify requests to individual shards for dealing with things like facet.mincount, facet.sort, facet.limit, facet.offset so that the distributed aggregation is correct) doesn't properly take into account localparams contained in each of the facet params and how they should affect the initial shard requests and the subsequent refinement requests. {panel:title=Initial problem example reported by user} When a distributed search contains multiselect faceting the per-field faceting options are not honored for alternate selections of the field. For example with a query like: {noformat} facet.field=blahfacet.field={!key myblah facet.offset=10}blahf.blah.facet.offset=20 {noformat} The returned facet results for both blah and myblah will use an offset of 20 as opposed to a standard search returning myblah with an offset of 10. {panel} was: When a distributed search contains multiselect faceting the per-field faceting options are not honored for alternate selections of the field. For example with a query like: {noformat} facet.field=blahfacet.field={!key myblah facet.offset=10}blahf.blah.facet.offset=20 {noformat} The returned facet results for both blah and myblah will use an offset of 20 as opposed to a standard search returning myblah with an offset of 10. Summary: using facet.* parameters as local params inside of facet.field causes problems in distributed search (was: Distributed search with multiselect faceting ignores the facet.offset local parameter) using facet.* parameters as local params inside of facet.field causes problems in distributed search Key: SOLR-6193 URL: https://issues.apache.org/jira/browse/SOLR-6193 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8.1 Environment: OS X 10.9.3 Apache Tomcat 7.0.41 Debian Apache Tomcat 7 Reporter: John Gibson Attachments: bad_facet_offset_test_4_8_x.patch The distributed request logic for faceting (which has to clonemodify requests to individual shards for dealing with things like facet.mincount, facet.sort, facet.limit, facet.offset so that the distributed aggregation is correct) doesn't properly take into account localparams contained in each of the facet params and how they should affect the initial shard requests and the subsequent refinement requests. {panel:title=Initial problem example reported by user} When a distributed search contains multiselect faceting the per-field faceting options are not honored for alternate selections of the field. For example with a query like: {noformat} facet.field=blahfacet.field={!key myblah facet.offset=10}blahf.blah.facet.offset=20 {noformat} The returned facet results for both blah and myblah will use an offset of 20 as opposed to a standard search returning myblah with an offset of 10. {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3617) Consider adding start scripts.
[ https://issues.apache.org/jira/browse/SOLR-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059389#comment-14059389 ] Mark Miller commented on SOLR-3617: --- Yeah, a flag for a more friendly dev mode is just as good as another script. Consider adding start scripts. -- Key: SOLR-3617 URL: https://issues.apache.org/jira/browse/SOLR-3617 Project: Solr Issue Type: New Feature Reporter: Mark Miller Attachments: SOLR-3617.patch I've always found that starting Solr with java -jar start.jar is a little odd if you are not a java guy, but I think there are bigger pros than looking less odd in shipping some start scripts. Not only do you get a cleaner start command: sh solr.sh or solr.bat or something But you also can do a couple other little nice things: * it becomes fairly obvious for a new casual user to see how to start the system without reading doc. * you can make the working dir the location of the script - this lets you call the start script from another dir and still have all the relative dir setup work. * have an out of the box place to save startup params like -Xmx. * we could have multiple start scripts - say solr-dev.sh that logged to the console and default to sys default for RAM - and also solr-prod which was fully configured for logging, pegged Xms and Xmx at some larger value (1GB?) etc. You would still of course be able to make the java cmd directly - and that is probably what you would do when it's time to run as a service - but these could be good starter scripts to get people on the right track and improve the initial user experience. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6193) using facet.* parameters as local params inside of facet.field causes problems in distributed search
[ https://issues.apache.org/jira/browse/SOLR-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059391#comment-14059391 ] Hoss Man commented on SOLR-6193: The crux of the problem is that the distributed facet logic, and the the shard sub-requests are generated, pre-dates the support for using local params in {{facet.field}} and is built upon the previous work of using per-field overrides (ie: {{f.myFieldName.facet.mincount=5}}). As a result, from my quick review of the code, there seem to be 2 different types of mistakes that can pop up in logic for building the shard requests: * not recognizing at the local params when computing shard request values (ie: ignoring localparam facet.limit to decide what the overrequest values should be for a given field) * propogating the localparam values to the shards in addition to the syntheticly generated f.foo.param equivilent for the shard (ie: sending the original localparams which might include facet.mincount even when the distributed logic is trying to force a mincount of 0 for the initial top-N computation. Adding to the complications, is that off the top of my head, i can't remember what sort of decisions were made when the localparam support was added regarding the precidence between a general local param vs a top level per-field param -- ie: what should the effective limit be here: {{f.foo.facet.limit=99facet.field=\{!facet.limit=44\}foo}} --- I think in general we should overhaul the way the distributed requests modify the per-field facet params to instead put all of those per field modifications directly in the local params of the shard requests -- among other things, this will help eliminate collision in some of the computed facet params when faceting on the same field multiple ways. before we tackle this though, we need a lot more comprehensive tests for some of these more complex situations -- beyond just the minimal distrib test that compares with the control collection. We need to assert that we get the specific expect responses, otherwise we could break both the existing single-node behavior in a lot of cases and never notice as long as the distrib behavior breaks in the same way. using facet.* parameters as local params inside of facet.field causes problems in distributed search Key: SOLR-6193 URL: https://issues.apache.org/jira/browse/SOLR-6193 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8.1 Environment: OS X 10.9.3 Apache Tomcat 7.0.41 Debian Apache Tomcat 7 Reporter: John Gibson Attachments: bad_facet_offset_test_4_8_x.patch The distributed request logic for faceting (which has to clonemodify requests to individual shards for dealing with things like facet.mincount, facet.sort, facet.limit, facet.offset so that the distributed aggregation is correct) doesn't properly take into account localparams contained in each of the facet params and how they should affect the initial shard requests and the subsequent refinement requests. {panel:title=Initial problem example reported by user} When a distributed search contains multiselect faceting the per-field faceting options are not honored for alternate selections of the field. For example with a query like: {noformat} facet.field=blahfacet.field={!key myblah facet.offset=10}blahf.blah.facet.offset=20 {noformat} The returned facet results for both blah and myblah will use an offset of 20 as opposed to a standard search returning myblah with an offset of 10. {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. -- Mark Miller about.me/markrmiller On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote: I think I've become aware of an edge case that I'm wondering is worth a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of indexes and add them one by one to the running Solr node via merge indexes. The mergeFactor appears to be ignored in this scenario. Indeed, I suspect (without proof) that the entire merge policy is never referenced at all. Historically this hasn't mattered, since merging indexes was 1 a rare operation 2 the merge policy _does_ kick in when the index has more documents added to it via the normal (not merge indexes) policy so things would be cleaned up. All that said, the mapReduceIndexerTool is a scenario where we may be merging multiple times without every indexing documents any other way. Seems like the core admin API should trigger the merge policy logic somehow. The problem here is that the number of segments can grow without bound. Worth a JIRA? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
bq: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially This isn't the nub of the issue. Assuming that the number of segments in the index merged in via MRIT is 1 each time, once that index gets merged into the live Solr node, the segments don't get merged no matter how many times another index is merged. I'm aware of an In the wild situation where over 6 months, there are over 600 segments. All updates were via MRIT. run MRIT once, 1 segment run MRIT a second time, 2 segments . . . run MRIT the Nth time, N segments (N 600 in this case) So running MRIT N times results in N segments on the Solr node since merge _indexes_ doesn't trigger _segment_ merging AFAIK. This has been masked in the past I'd guess because subsequent regular indexing via SolrJ, post.jar, whatever _does_ then trigger segment merging. But we haven't seen the situation reported before where the _only_ way the index gets updated is via index merging. Index merging is done via MRIT in this case although this has nothing to do with MRIT and everything to do with the core admin mergeindexes command. MRIT is only relevant here since it's pretty much the first tool that conveniently allowed the only updates to be via mergeindexes. I reproduced this locally without MRIT by just taking a stock Solr, copying the index somewhere else, setting mergeFactor=2 then merging (and committing) again and again. Stopped at 15 segments or so. Then sent a couple of updates up via cURL and the segment count dropped back to 2.. Whether the right place to fix this is Solr core Admin API MERGEINDEXES or in the lower-level Lucene call I don't have a strong opinion about. Of course one work-around is to periodically issue an optimize even though Uwe cringes every time that gets mentioned ;) On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. -- Mark Miller about.me/markrmiller On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote: I think I've become aware of an edge case that I'm wondering is worth a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of indexes and add them one by one to the running Solr node via merge indexes. The mergeFactor appears to be ignored in this scenario. Indeed, I suspect (without proof) that the entire merge policy is never referenced at all. Historically this hasn't mattered, since merging indexes was 1 a rare operation 2 the merge policy _does_ kick in when the index has more documents added to it via the normal (not merge indexes) policy so things would be cleaned up. All that said, the mapReduceIndexerTool is a scenario where we may be merging multiple times without every indexing documents any other way. Seems like the core admin API should trigger the merge policy logic somehow. The problem here is that the number of segments can grow without bound. Worth a JIRA? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-5672: --- Assignee: Robert Muir Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059435#comment-14059435 ] Robert Muir commented on LUCENE-5672: - I don't agree with this argument about This lets the caller decide how expensive should addIndexes be, on his part. The user can freely configure this with MergePolicy. Its no different from any other index operation. This is a bug. There is lots of confusion, including a current discussion on the ML. Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
You encouraged me to fix it :) On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com wrote: bq: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially This isn't the nub of the issue. Assuming that the number of segments in the index merged in via MRIT is 1 each time, once that index gets merged into the live Solr node, the segments don't get merged no matter how many times another index is merged. I'm aware of an In the wild situation where over 6 months, there are over 600 segments. All updates were via MRIT. run MRIT once, 1 segment run MRIT a second time, 2 segments . . . run MRIT the Nth time, N segments (N 600 in this case) So running MRIT N times results in N segments on the Solr node since merge _indexes_ doesn't trigger _segment_ merging AFAIK. This has been masked in the past I'd guess because subsequent regular indexing via SolrJ, post.jar, whatever _does_ then trigger segment merging. But we haven't seen the situation reported before where the _only_ way the index gets updated is via index merging. Index merging is done via MRIT in this case although this has nothing to do with MRIT and everything to do with the core admin mergeindexes command. MRIT is only relevant here since it's pretty much the first tool that conveniently allowed the only updates to be via mergeindexes. I reproduced this locally without MRIT by just taking a stock Solr, copying the index somewhere else, setting mergeFactor=2 then merging (and committing) again and again. Stopped at 15 segments or so. Then sent a couple of updates up via cURL and the segment count dropped back to 2.. Whether the right place to fix this is Solr core Admin API MERGEINDEXES or in the lower-level Lucene call I don't have a strong opinion about. Of course one work-around is to periodically issue an optimize even though Uwe cringes every time that gets mentioned ;) On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. -- Mark Miller about.me/markrmiller On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote: I think I've become aware of an edge case that I'm wondering is worth a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of indexes and add them one by one to the running Solr node via merge indexes. The mergeFactor appears to be ignored in this scenario. Indeed, I suspect (without proof) that the entire merge policy is never referenced at all. Historically this hasn't mattered, since merging indexes was 1 a rare operation 2 the merge policy _does_ kick in when the index has more documents added to it via the normal (not merge indexes) policy so things would be cleaned up. All that said, the mapReduceIndexerTool is a scenario where we may be merging multiple times without every indexing documents any other way. Seems like the core admin API should trigger the merge policy logic somehow. The problem here is that the number of segments can grow without bound. Worth a JIRA? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
On July 11, 2014 at 6:10:07 PM, Erick Erickson (erickerick...@gmail.com) wrote: bq: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially This isn't the nub of the issue. The rest of the sentence is required: , and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. Of course one work-around is to periodically issue an optimize even though Uwe cringes every time that gets mentioned ;) You don’t need a full optimize, you just need to occasionally force merge down to N segments. You could trigger it after running the mapreduce tool, chron could trigger it, or whatever. That’s how you have to handle it currently. -- Mark Miller about.me/markrmiller - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
Ah, OK. It's late on Friday and I missed that Erick On Fri, Jul 11, 2014 at 3:20 PM, Mark Miller markrmil...@gmail.com wrote: On July 11, 2014 at 6:10:07 PM, Erick Erickson (erickerick...@gmail.com) wrote: bq: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially This isn't the nub of the issue. The rest of the sentence is required: , and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. Of course one work-around is to periodically issue an optimize even though Uwe cringes every time that gets mentioned ;) You don’t need a full optimize, you just need to occasionally force merge down to N segments. You could trigger it after running the mapreduce tool, chron could trigger it, or whatever. That’s how you have to handle it currently. -- Mark Miller about.me/markrmiller - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
It's been a whole hour, you're slowing down. I promised the original reporter that there would be a JIRA he could track, got one? Erick On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir rcm...@gmail.com wrote: You encouraged me to fix it :) On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com wrote: bq: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially This isn't the nub of the issue. Assuming that the number of segments in the index merged in via MRIT is 1 each time, once that index gets merged into the live Solr node, the segments don't get merged no matter how many times another index is merged. I'm aware of an In the wild situation where over 6 months, there are over 600 segments. All updates were via MRIT. run MRIT once, 1 segment run MRIT a second time, 2 segments . . . run MRIT the Nth time, N segments (N 600 in this case) So running MRIT N times results in N segments on the Solr node since merge _indexes_ doesn't trigger _segment_ merging AFAIK. This has been masked in the past I'd guess because subsequent regular indexing via SolrJ, post.jar, whatever _does_ then trigger segment merging. But we haven't seen the situation reported before where the _only_ way the index gets updated is via index merging. Index merging is done via MRIT in this case although this has nothing to do with MRIT and everything to do with the core admin mergeindexes command. MRIT is only relevant here since it's pretty much the first tool that conveniently allowed the only updates to be via mergeindexes. I reproduced this locally without MRIT by just taking a stock Solr, copying the index somewhere else, setting mergeFactor=2 then merging (and committing) again and again. Stopped at 15 segments or so. Then sent a couple of updates up via cURL and the segment count dropped back to 2.. Whether the right place to fix this is Solr core Admin API MERGEINDEXES or in the lower-level Lucene call I don't have a strong opinion about. Of course one work-around is to periodically issue an optimize even though Uwe cringes every time that gets mentioned ;) On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote: I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. -- Mark Miller about.me/markrmiller On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote: I think I've become aware of an edge case that I'm wondering is worth a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of indexes and add them one by one to the running Solr node via merge indexes. The mergeFactor appears to be ignored in this scenario. Indeed, I suspect (without proof) that the entire merge policy is never referenced at all. Historically this hasn't mattered, since merging indexes was 1 a rare operation 2 the merge policy _does_ kick in when the index has more documents added to it via the normal (not merge indexes) policy so things would be cleaned up. All that said, the mapReduceIndexerTool is a scenario where we may be merging multiple times without every indexing documents any other way. Seems like the core admin API should trigger the merge policy logic somehow. The problem here is that the number of segments can grow without bound. Worth a JIRA? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5672: Attachment: LUCENE-5672.patch Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5672.patch I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
https://issues.apache.org/jira/browse/LUCENE-5672 : Date: Fri, 11 Jul 2014 15:22:40 -0700 : From: Erick Erickson erickerick...@gmail.com : Reply-To: dev@lucene.apache.org : To: dev@lucene.apache.org : Subject: Re: Core admin merge indexes, should it trigger merge policy? : : It's been a whole hour, you're slowing down. : : I promised the original reporter that there would be a JIRA he could : track, got one? : : Erick : : On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir rcm...@gmail.com wrote: : You encouraged me to fix it :) : : On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com wrote: : bq: I think you would probably want to control the number of segments : with the MapReduceIndexerTool before doing the merge initially : : This isn't the nub of the issue. Assuming that the number of segments : in the index merged in via MRIT is 1 each time, once that index gets : merged into the live Solr node, the segments don't get merged no : matter how many times another index is merged. I'm aware of an In the : wild situation where over 6 months, there are over 600 segments. All : updates were via MRIT. : : run MRIT once, 1 segment : run MRIT a second time, 2 segments : . : . : . : run MRIT the Nth time, N segments (N 600 in this case) : : So running MRIT N times results in N segments on the Solr node since : merge _indexes_ doesn't trigger _segment_ merging AFAIK. : : This has been masked in the past I'd guess because subsequent : regular indexing via SolrJ, post.jar, whatever _does_ then trigger : segment merging. But we haven't seen the situation reported before : where the _only_ way the index gets updated is via index merging. : Index merging is done via MRIT in this case although this has nothing : to do with MRIT and everything to do with the core admin mergeindexes : command. MRIT is only relevant here since it's pretty much the first : tool that conveniently allowed the only updates to be via : mergeindexes. : : I reproduced this locally without MRIT by just taking a stock Solr, : copying the index somewhere else, setting mergeFactor=2 then merging : (and committing) again and again. Stopped at 15 segments or so. Then : sent a couple of updates up via cURL and the segment count dropped : back to 2.. : : Whether the right place to fix this is Solr core Admin API : MERGEINDEXES or in the lower-level Lucene call I don't have a strong : opinion about. : : Of course one work-around is to periodically issue an optimize even : though Uwe cringes every time that gets mentioned ;) : : On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote: : I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. : : -- : Mark Miller : about.me/markrmiller : : On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote: : I think I've become aware of an edge case that I'm wondering is worth : a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of : indexes and add them one by one to the running Solr node via merge : indexes. The mergeFactor appears to be ignored in this scenario. : Indeed, I suspect (without proof) that the entire merge policy is : never referenced at all. : : Historically this hasn't mattered, since merging indexes was : 1 a rare operation : 2 the merge policy _does_ kick in when the index has more documents : added to it via the normal (not merge indexes) policy so things would : be cleaned up. : : All that said, the mapReduceIndexerTool is a scenario where we may be : merging multiple times without every indexing documents any other way. : Seems like the core admin API should trigger the merge policy logic : somehow. The problem here is that the number of segments can grow : without bound. : : Worth a JIRA? : : Erick : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : : : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : : - : To unsubscribe,
[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059449#comment-14059449 ] Mark Miller commented on LUCENE-5672: - bq. The user can freely configure this with MergePolicy. Its no different from any other index operation. This is a bug. I was leaning towards Shai's argument at first, but after a bit of deeper thought, I agree with Robert. I don't know that having an option to not use the merge policy will add any confusion if the default is right, but it does seem the merge policy itself is sufficient for cases I can think of. I don't know that you need this extra way to control merges. Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5672.patch I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Core admin merge indexes, should it trigger merge policy?
Yes, I opened that issue a while ago because i have seen people accidentally create hundreds/thousands of segments (just with lucene api) due to this same trap. This should not happen to you by default. If you want to create hundreds or thousands of segments for your index, it should be because you (mis)configured your merge policy intentionally to create such a situation. On Fri, Jul 11, 2014 at 6:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote: https://issues.apache.org/jira/browse/LUCENE-5672 : Date: Fri, 11 Jul 2014 15:22:40 -0700 : From: Erick Erickson erickerick...@gmail.com : Reply-To: dev@lucene.apache.org : To: dev@lucene.apache.org : Subject: Re: Core admin merge indexes, should it trigger merge policy? : : It's been a whole hour, you're slowing down. : : I promised the original reporter that there would be a JIRA he could : track, got one? : : Erick : : On Fri, Jul 11, 2014 at 3:17 PM, Robert Muir rcm...@gmail.com wrote: : You encouraged me to fix it :) : : On Fri, Jul 11, 2014 at 6:09 PM, Erick Erickson erickerick...@gmail.com wrote: : bq: I think you would probably want to control the number of segments : with the MapReduceIndexerTool before doing the merge initially : : This isn't the nub of the issue. Assuming that the number of segments : in the index merged in via MRIT is 1 each time, once that index gets : merged into the live Solr node, the segments don't get merged no : matter how many times another index is merged. I'm aware of an In the : wild situation where over 6 months, there are over 600 segments. All : updates were via MRIT. : : run MRIT once, 1 segment : run MRIT a second time, 2 segments : . : . : . : run MRIT the Nth time, N segments (N 600 in this case) : : So running MRIT N times results in N segments on the Solr node since : merge _indexes_ doesn't trigger _segment_ merging AFAIK. : : This has been masked in the past I'd guess because subsequent : regular indexing via SolrJ, post.jar, whatever _does_ then trigger : segment merging. But we haven't seen the situation reported before : where the _only_ way the index gets updated is via index merging. : Index merging is done via MRIT in this case although this has nothing : to do with MRIT and everything to do with the core admin mergeindexes : command. MRIT is only relevant here since it's pretty much the first : tool that conveniently allowed the only updates to be via : mergeindexes. : : I reproduced this locally without MRIT by just taking a stock Solr, : copying the index somewhere else, setting mergeFactor=2 then merging : (and committing) again and again. Stopped at 15 segments or so. Then : sent a couple of updates up via cURL and the segment count dropped : back to 2.. : : Whether the right place to fix this is Solr core Admin API : MERGEINDEXES or in the lower-level Lucene call I don't have a strong : opinion about. : : Of course one work-around is to periodically issue an optimize even : though Uwe cringes every time that gets mentioned ;) : : On Fri, Jul 11, 2014 at 2:42 PM, Mark Miller markrmil...@gmail.com wrote: : I think you would probably want to control the number of segments with the MapReduceIndexerTool before doing the merge initially, and if you find you have too many segments over time as you add more and more data, use a force merge call to reduce the number segments, either manually or scheduled. : : -- : Mark Miller : about.me/markrmiller : : On July 11, 2014 at 4:36:22 PM, Erick Erickson (erickerick...@gmail.com) wrote: : I think I've become aware of an edge case that I'm wondering is worth : a JIRA. Say I have a mergeFactor of 2. Now say I create a bunch of : indexes and add them one by one to the running Solr node via merge : indexes. The mergeFactor appears to be ignored in this scenario. : Indeed, I suspect (without proof) that the entire merge policy is : never referenced at all. : : Historically this hasn't mattered, since merging indexes was : 1 a rare operation : 2 the merge policy _does_ kick in when the index has more documents : added to it via the normal (not merge indexes) policy so things would : be cleaned up. : : All that said, the mapReduceIndexerTool is a scenario where we may be : merging multiple times without every indexing documents any other way. : Seems like the core admin API should trigger the merge policy logic : somehow. The problem here is that the number of segments can grow : without bound. : : Worth a JIRA? : : Erick : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : : : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org :
[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059455#comment-14059455 ] Robert Muir commented on LUCENE-5672: - FYI: this is the third time i've heard of this trap hitting people and creating hundreds or thousands of index segments: once was from coworkers at a past job, twice was lucene user list discussion Merger performance degradation on 3.6.1, thrice was Erick's recent ML post. For people that don't want merging they have NoMergePolicy, maybeMerge() is even documented as expert and Explicit calls to maybeMerge() are usually not necessary. The most common case is when merge policy parameters have changed. So requiring the user to manually invoke this after index operations to prevent segment explosion is wrong IMO. Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5672.patch I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2894: --- Attachment: SOLR-2894.patch bq. Quick note on PivotFacetHelper's retrieve method ... I haven't really been aware of those other issues until now (although SOLR-3583 may explain some of the unused code i pruned from PivotListEntry a few patches ago) but i agree with your assessment: if/when enhancements to distributed pivots start dealing with adding optional data to each level of the pivot, the appraoch currently used will have to change. (Personally: I'm not emotionally ready to put any serious thought into that level of implementation detail in future pivot improvements - i want to focus on getting the basics of distrib pivots solid released first) Updated patch with most of the tests i had in mind that i mentioned before (although i'd still like to add some more facet.missing tests)... * TestCloudPivotFacet ** randomize overrequest amounts ** randomize facet.mincount usage assert never exceded ** randomize facet.missing usage assert that null values are only ever last in list of values *** make the odds of docs missing a field more randomized (across test runs) ** add in the possibility of trying to pivot on a field that is in 0 docs ** Dial back some constants to reduce OOM risk when running -Dtests.nightly=true ** example refine count failure from the facet.missing problem (unless there's another bug that looks really similar) with these changes: *** {{ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=98C12D5256897A09 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=sr -Dtests.timezone=America/Louisville -Dtests.file.encoding=UTF-8}} * DistributedFacetPivotLongTailTest ** some data tweaks an additional assertion to ensure refinement is happening * DistributedFacetPivotSmallTest ** s/honda/null/g - help test that the 4 character string null isn't triggering any special behavior, or getting confused with a missing value in docs. * DistributedFacetPivotLargeTest ** comment assert noting that a shard is left empty (helps with edge case testing of result merging refinement) ** added assertPivot helper method did a bit of refactoring ** added test of 2 diff pivots in the same request (swap field order) ** added test of same bi-level pivot with w/o a tagged fq exclusion in the same request ** added test variants of facet.limit facet.index used as localparam *** currently commented out because it doesn't work -- see SOLR-6193 The problem noted above with using {{facet.*}} params as local params in {{facet.pivot}} is something i discovered earlier this week while writing up these tests. I initially set the problem set it asside to keep working on tests, with hte intention of looking into a fix once i had better coverage of the problem -- but then when i came back to revisit it yesterdan and looked to the existing {{facet.field}} shard request logic for guidance, i discovered that didn't seem to work the way i expected either and realized John Gibson recently filed SOLR-6193 because {{facet.field}} _does_ have the exact same problem. i don't think we should let this block adding distributed facet.pivot, let's tackle it holisticly for all faceting in SOLR-6193. Andrew/Brett: have you guys had a chance to look into the refinement bug when {{facet.missing}} is used? (BTW: my update patch only affected test files, so hopefully theres no collision with anything you guys have been working on -- but if there is, feel free to just post whatever patch you guys come up with and i'll handle the merge) Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Assignee: Hoss Man Fix For: 4.9, 5.0 Attachments: SOLR-2894-mincount-minification.patch, SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch, dateToObject.patch, pivot_mincount_problem.sh Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message was sent by
[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059530#comment-14059530 ] Erick Erickson commented on LUCENE-5672: Agreed, we _rely_ on segment merging in the normal state, to have it fail in this case is trappy. Commit it I say. Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5672.patch I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5746) solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema
[ https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059550#comment-14059550 ] Hoss Man commented on SOLR-5746: Hi Maciej, I glance over the patch a bit more today: 1) can you explain why the need for the new {{DOMUtil.readNamedChildrenAsNamedList}} method that you added instead of just using the existing {{DOMUtil.childNodesToNamedList}} (which delegates to {{addToNamedList}} and immediately validates that element text conforms to the stated type). I realize that using {{DOMUtil.childNodesToNamedList}} won't magically help parse/validate the config options in the backcompat cases like {{str name=shareSchematrue/str}} -- but that's where things like your {{storeConfigPropertyAsInt}} and {{storeConfigPropertyAsBoolean}} can be in charge of doing the cast if the raw value is still a string. (i want to make sure we aren't introducing a redundant method in {{DOMUtil}}. 2) Speaking of which: what's the purpose exactly of configAsSolrParams if the original NamedList is still being passed to the {{storeConfigPropertyAs*}} methods - why not just get the values directly from there? 3) One piece of validation that i believe is still missing here is to throw an errir if/when a config value is specified multiple times -- I i remember the behavior of NamedList correctly, i think the way you have things now it will silently just use the first one, and then remove both. We should definitely have an error check that any of these single valued config options is in fact only specified once in the NamedList -- so people don't try to add a setting they've read about in the docs w/o realizing it's already defined higher up in the file (we've seen that happen with settings in solrconfig.xml many times between we locked that down and made it an error case) solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema -- Key: SOLR-5746 URL: https://issues.apache.org/jira/browse/SOLR-5746 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5, 4.6 Reporter: Hoss Man Attachments: SOLR-5746.patch, SOLR-5746.patch A comment in the ref guide got me looking at ConfigSolrXml.java and noticing that the parsing of solr.xml options here is very brittle and confusing. In particular: * if a boolean option foo is expected along the lines of {{bool name=footrue/bool}} it will silently ignore {{str name=footrue/str}} * likewise for an int option {{int name=bar32/int}} vs {{str name=bar32/str}} ... this is inconsistent with the way solrconfig.xml is parsed. In solrconfig.xml, the xml nodes are parsed into a NamedList, and the above options will work in either form, but an invalid value such as {{bool name=fooNOT A BOOLEAN/bool}} will generate an error earlier (when parsing config) then {{str name=fooNOT A BOOLEAN/str}} (attempt to parse the string as a bool the first time the config value is needed) In addition, i notice this really confusing line... {code} propMap.put(CfgProp.SOLR_SHARESCHEMA, doSub(solr/str[@name='shareSchema'])); {code} shareSchema is used internally as a boolean option, but as written the parsing code will ignore it unless the user explicitly configures it as a {{str/}} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1706 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1706/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexSplitter.test Error Message: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/test/J0/temp/lucene.index.TestIndexSplitter-C26121833CA2DB9E-001/TestIndexSplitter-001/_2.si Stack Trace: java.nio.file.NoSuchFileException: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/test/J0/temp/lucene.index.TestIndexSplitter-C26121833CA2DB9E-001/TestIndexSplitter-001/_2.si at __randomizedtesting.SeedInfo.seed([C26121833CA2DB9E:4A351E59925EB666]:0) at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:334) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:196) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:106) at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:447) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:787) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:633) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:443) at org.apache.lucene.index.IndexSplitter.init(IndexSplitter.java:95) at org.apache.lucene.index.TestIndexSplitter.test(TestIndexSplitter.java:69) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (LUCENE-5672) Addindexes does not call maybeMerge
[ https://issues.apache.org/jira/browse/LUCENE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059621#comment-14059621 ] David Smiley commented on LUCENE-5672: -- bq. Commit it I say. +1; this is a bug. Addindexes does not call maybeMerge --- Key: LUCENE-5672 URL: https://issues.apache.org/jira/browse/LUCENE-5672 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-5672.patch I don't know why this was removed, but this is buggy and just asking for trouble. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60) - Build # 10673 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10673/ Java: 32bit/jdk1.7.0_60 -server -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestBinaryDocValuesUpdates.testManyReopensAndFields Error Message: MockDirectoryWrapper: cannot close: there are still open files: {_r.fdt=1, _r_MockVariableIntBlock_0.tib=1, _r_1_Asserting_0.dvd=1, _r_2_Memory_0.mdvd=1, _r_Asserting_0.dvd=1, _r_Lucene49_0.dvd=1, _r_MockVariableIntBlock_0.doc=1, _r_MockVariableIntBlock_0.skp=1, _r_SimpleText_0.dat=1, _r_Memory_0.mdvd=1} Stack Trace: java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_r.fdt=1, _r_MockVariableIntBlock_0.tib=1, _r_1_Asserting_0.dvd=1, _r_2_Memory_0.mdvd=1, _r_Asserting_0.dvd=1, _r_Lucene49_0.dvd=1, _r_MockVariableIntBlock_0.doc=1, _r_MockVariableIntBlock_0.skp=1, _r_SimpleText_0.dat=1, _r_Memory_0.mdvd=1} at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:669) at org.apache.lucene.util.IOUtils.close(IOUtils.java:77) at org.apache.lucene.index.TestBinaryDocValuesUpdates.testManyReopensAndFields(TestBinaryDocValuesUpdates.java:741) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at