[jira] [Created] (SOLR-5613) Upgrade Apache Commons Codec to version 1.9 in order to improve performance of BeiderMorseFilter

2014-01-07 Thread Thomas Champagne (JIRA)
Thomas Champagne created SOLR-5613:
--

 Summary: Upgrade Apache Commons Codec to version 1.9 in order to 
improve performance of BeiderMorseFilter
 Key: SOLR-5613
 URL: https://issues.apache.org/jira/browse/SOLR-5613
 Project: Solr
  Issue Type: Improvement
  Components: Rules, Schema and Analysis, search
Affects Versions: 4.6, 4.5.1, 4.5, 4.4, 4.3.1, 4.3, 4.2.1, 4.2, 4.1, 4.0, 
3.6.2, 3.6.1, 3.6
Reporter: Thomas Champagne


In version 1.9 of commons-codec project, there are a lot of optimizations in 
the Beider Morse encoder. This is used by the BeiderMorseFilter in Solr. 
Do you think it is possible to upgrade this dependency ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5288) Delta import is calling applyTranformer() during deltaQuerry and causing ScriptException

2014-01-07 Thread Daniele Baldi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864057#comment-13864057
 ] 

Daniele Baldi commented on SOLR-5288:
-

Hi,
I found this error while experimenting delta import using TemplateTransformer:

 WARN   : TemplateTransformer : Unable to resolve variable: variableName 
while parsing expression: ${variableName}

This error is thrown because SOLR try to apply transformers on deltaQuery, too. 
I also think transformation is not required for deltaQuery. 

Thanks
Daniele

 Delta import is calling applyTranformer() during deltaQuerry and causing 
 ScriptException
 

 Key: SOLR-5288
 URL: https://issues.apache.org/jira/browse/SOLR-5288
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.4
Reporter: Balaji Manoharan
Priority: Critical

 While experimenting delta import, was getting Script Exception such as 
 'toString()' is not found on null.
 These are the queries that am using
 a) Query  SELECT PK_FIELD, JOIN_DATE, USER_NAME FROM USERS
 b) Delta Query  SELECY PK_FIELD FROM USERS WHERE LAST_MODIFIED_DATE  
 '${dih.last_index_time}'
 c) Delta Import Query  SELECT PK_FIELD, JOIN_DATE, USER_NAME FROM USERS 
 WHERE PK_FIELD = '${dih.delta.PK_FIELD}'
 Have a script transformer as below
 function dynamicData(){
   var joinDt = row.get('JOIN_DATE');
   var dtDisplay = joinDt.toString();  //e.g to show that am not doing 
 null check since join_date is a not null field
   ...
   ...
   return row;
 }
 entity name=user transformer=script:dynamicData .. 
 ...
 /entity
 Problem: While performing delta import, was getting exception from Rhino 
 engine on the script line 'joinDt.toString()'.
 The exception trace is as follows
 Caused by: javax.script.ScriptException: 
 sun.org.mozilla.javascript.internal.EcmaError: TypeError: Cannot call method 
 t
 oString of null (Unknown source#4) in Unknown source at line number 4
 at 
 com.sun.script.javascript.RhinoScriptEngine.invoke(RhinoScriptEngine.java:300)
 at 
 com.sun.script.javascript.RhinoScriptEngine.invokeFunction(RhinoScriptEngine.java:258)
 at 
 org.apache.solr.handler.dataimport.ScriptTransformer.transformRow(ScriptTransformer.java:56)
 ... 8 more
 Root Cause: Since I know join_date can not be null, have explored the solr 
 source code and noticed that applyTransformer() is called during deltaQuery 
 and at that time join_date will not be available.
 Reference: EntityProcessorWrapper.nextModifiedRowKey()
 I think transformation is not required for deltaQuery since it is mainly 
 designed to retrieving the primary keys of the modified rows. Further, the 
 output of deltaQuery will be used only in another SQL.
 Work around:
 Just added a null check as a workaround as below 
 function dynamicData(){
   var joinDt = row.get('JOIN_DATE');
   if(joinDt == null){
   return row;
   }
   ...
   ...
   return row;
 }
 I don't have too much knowledge about Solr and hence my suggestion could be 
 invalid while looking from main use cases.
 Please validate my comments once.
 Thanks
 Balaji



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

2014-01-07 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864141#comment-13864141
 ] 

Markus Jelsma commented on SOLR-4260:
-

Ok, I followed all the great work here and in related tickets and yesterday i 
had the time to rebuild Solr and check for this issue. I hadn't seen it 
yesterday but it is right in front of me again, using a fresh build from 
January 6th.

Leader has Num Docs: 379659
Replica has Num Docs: 379661

 Inconsistent numDocs between leader and replica
 ---

 Key: SOLR-4260
 URL: https://issues.apache.org/jira/browse/SOLR-4260
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7

 Attachments: 192.168.20.102-replica1.png, 
 192.168.20.104-replica2.png, clusterstate.png


 After wiping all cores and reindexing some 3.3 million docs from Nutch using 
 CloudSolrServer we see inconsistencies between the leader and replica for 
 some shards.
 Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
 a small deviation in then number of documents. The leader and slave deviate 
 for roughly 10-20 documents, not more.
 Results hopping ranks in the result set for identical queries got my 
 attention, there were small IDF differences for exactly the same record 
 causing a record to shift positions in the result set. During those tests no 
 records were indexed. Consecutive catch all queries also return different 
 number of numDocs.
 We're running a 10 node test cluster with 10 shards and a replication factor 
 of two and frequently reindex using a fresh build from trunk. I've not seen 
 this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864147#comment-13864147
 ] 

Markus Jelsma commented on SOLR-5379:
-

How does this patch handle boosts?  Are the synonym and the original keywords 
boosted equally?

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864149#comment-13864149
 ] 

Ahmet Arslan commented on SOLR-5379:


Assume synonyms are {code}  usa, united states of america {code} What happens 
if I fire the following sloppy phrase query  *president usa~5*

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5614) Boost documents using map and query functions

2014-01-07 Thread Anca Kopetz (JIRA)
Anca Kopetz created SOLR-5614:
-

 Summary: Boost documents using map and query functions
 Key: SOLR-5614
 URL: https://issues.apache.org/jira/browse/SOLR-5614
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Anca Kopetz


We want to boost documents that contain specific search terms in its fields. 

We tried the following simplified query : 
http://localhost:8983/solr/collection1/select?q=ipod 
belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power

And we get the following error : 
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
at org.apache.solr.search.QParser.subQuery(QParser.java:200)
at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
at 

[jira] [Updated] (SOLR-5614) Boost documents using map and query functions

2014-01-07 Thread Anca Kopetz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anca Kopetz updated SOLR-5614:
--

Description: 
We want to boost documents that contain specific search terms in its fields. 

We tried the following simplified query : 
http://localhost:8983/solr/collection1/select?q=ipod%20belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power

And we get the following error : 
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
at org.apache.solr.search.QParser.subQuery(QParser.java:200)
at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
at 

[jira] [Updated] (SOLR-5614) Boost documents using map and query functions

2014-01-07 Thread Anca Kopetz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anca Kopetz updated SOLR-5614:
--

Description: 
We want to boost documents that contain specific search terms in its fields. 

We tried the following simplified query : 
http://localhost:8983/solr/collection1/select?q=ipod 
belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power

And we get the following error : 
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
'power'

And the stacktrace :

ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
Infinite Recursion detected parsing query 'power'
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
parsing query 'power'
at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
at org.apache.solr.search.QParser.subQuery(QParser.java:200)
at 
org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
at 
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
at 
org.apache.solr.search.ValueSourceParser$19.parse(ValueSourceParser.java:270)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223)
at 
org.apache.solr.search.ValueSourceParser$13.parse(ValueSourceParser.java:198)
at 
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352)
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
at 

[jira] [Commented] (SOLR-5609) Don't let cores create slices/named replicas

2014-01-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864241#comment-13864241
 ] 

Noble Paul commented on SOLR-5609:
--

makes sense to have an omnibus property like legacyCloudMode  rather than 
having specific properties for each behavior.


 Don't let cores create slices/named replicas
 

 Key: SOLR-5609
 URL: https://issues.apache.org/jira/browse/SOLR-5609
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
 Fix For: 5.0, 4.7


 In SolrCloud, it is possible for a core to come up in any node , and register 
 itself with an arbitrary slice/coreNodeName. This is a legacy requirement and 
 we would like to make it only possible for Overseer to initiate creation of 
 slice/replicas
 We plan to introduce cluster level properties at the top level
 /cluster-props.json
 {code:javascript}
 {
 noSliceOrReplicaByCores:true
 }
 {code}
 If this property is set to true, cores won't be able to send STATE commands 
 with unknown slice/coreNodeName . Those commands will fail at Overseer. This 
 is useful for SOLR-5310 / SOLR-5311 where a core/replica is deleted by a 
 command and  it comes up later and tries to create a replica/slice



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5476) Overseer Role for nodes

2014-01-07 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5476:
-

Attachment: SOLR-5476.patch

 Overseer Role for nodes
 ---

 Key: SOLR-5476
 URL: https://issues.apache.org/jira/browse/SOLR-5476
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5476.patch, SOLR-5476.patch, SOLR-5476.patch, 
 SOLR-5476.patch


 In a very large cluster the Overseer is likely to be overloaded.If the same 
 node is a serving a few other shards it can lead to OverSeer getting slowed 
 down due to GC pauses , or simply too much of work  . If the cluster is 
 really large , it is possible to dedicate high end h/w for OverSeers
 It works as a new collection admin command
 command=addrolerole=overseernode=192.168.1.5:8983_solr
 This results in the creation of a entry in the /roles.json in ZK which would 
 look like the following
 {code:javascript}
 {
 overseer : [192.168.1.5:8983_solr]
 }
 {code}
 If a node is designated for overseer it gets preference over others when 
 overseer election takes place. If no designated servers are available another 
 random node would become the Overseer.
 Later on, if one of the designated nodes are brought up ,it would take over 
 the Overseer role from the current Overseer to become the Overseer of the 
 system



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)
Ramkumar Aiyengar created SOLR-5615:
---

 Summary: Deadlock while trying to recover after a ZK session expiry
 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.5, 4.4
Reporter: Ramkumar Aiyengar


The sequence of events which might trigger this is as follows:

 - Leader of a shard, say OL, has a ZK expiry
 - The new leader, NL, starts the election process
 - NL, through Overseer, clears the current leader (OL) for the shard from the 
cluster state
 - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
 - OL marks itself down
 - OL sets up watches for cluster state, and then retrieves it (with no leader 
for this shard)
 - NL, through Overseer, updates cluster state to mark itself leader for the 
shard
 - OL tries to register itself as a replica, and waits till the cluster state 
is updated
   with the new leader from event thread
 - ZK sends a watch update to OL, but it is blocked on the event thread waiting 
for it.

Oops. This finally breaks out after trying to register itself as replica times 
out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: Allow ConnectionManager.process to run from multi...

2014-01-07 Thread andyetitmoves
GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/13

Allow ConnectionManager.process to run from multiple threads

One potential fix for SOLR-5615. Hardly sure about whether this is the 
correct way to go about this, but it's a start I guess..

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andyetitmoves/lucene-solr 
on-recovery-deadlock-4x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/13.patch


commit ad7ac506bc614d43f391aaad7ab25d9b426421c4
Author: Ramkumar Aiyengar raiyen...@bloomberg.net
Date:   2014-01-07T11:57:25Z

Allow ConnectionManager.process to run from multiple threads




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864250#comment-13864250
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Submitted https://github.com/apache/lucene-solr/pull/13 for one possible 
solution, though I am not sure if this is the right way to go about this..

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar

 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-07 Thread Remi Melisson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864328#comment-13864328
 ] 

Remi Melisson commented on LUCENE-5354:
---

Hi, any news about this feature ?
Could I do anything else ?

 Blended score in AnalyzingInfixSuggester
 

 Key: LUCENE-5354
 URL: https://issues.apache.org/jira/browse/LUCENE-5354
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 4.4
Reporter: Remi Melisson
Priority: Minor
  Labels: suggester
 Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch


 I'm working on a custom suggester derived from the AnalyzingInfix. I require 
 what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) 
 to transform the suggestion weights depending on the position of the searched 
 term(s) in the text.
 Right now, I'm using an easy solution :
 If I want 10 suggestions, then I search against the current ordered index for 
 the 100 first results and transform the weight :
 bq. a) by using the term position in the text (found with TermVector and 
 DocsAndPositionsEnum)
 or
 bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
 searching
 and return the updated 10 most weighted suggestions.
 Since we usually don't need to suggest so many things, the bigger search + 
 rescoring overhead is not so significant but I agree that this is not the 
 most elegant solution.
 We could include this factor (here the position of the term) directly into 
 the index.
 So, I can contribute to this if you think it's worth adding it.
 Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
 dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Not sure given the info, but the patch doesn't seem crazy to me. I've made a 
few adjustments in this patch.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5613) Upgrade Apache Commons Codec to version 1.9 in order to improve performance of BeiderMorseFilter

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864364#comment-13864364
 ] 

Shawn Heisey commented on SOLR-5613:


I upgraded commons-codec to 1.9 on an up-to-date branch_4x checkout and found 
that all tests (both Lucene and Solr) passed.  This was on a linux machine.  I 
wasn't too surprised by this.  I think we can accommodate this request easily.

Just for giggles, I went even further and upgraded all commons.apache.org 
components to the newest versions I could find via ivy.  All tests *still* 
passed.  This was on a Windows 8 machine.  With so many upgrades, I was really 
surprised it passed.

{code}
Index: lucene/ivy-versions.properties
===
--- lucene/ivy-versions.properties  (revision 1555313)
+++ lucene/ivy-versions.properties  (working copy)
@@ -19,16 +19,16 @@
 /com.ibm.icu/icu4j = 52.1
 /com.spatial4j/spatial4j = 0.3
 /com.sun.jersey/jersey-core = 1.16
-/commons-beanutils/commons-beanutils = 1.7.0
+/commons-beanutils/commons-beanutils = 1.9.0
 /commons-cli/commons-cli = 1.2
-/commons-codec/commons-codec = 1.7
+/commons-codec/commons-codec = 1.9
 /commons-collections/commons-collections = 3.2.1
-/commons-configuration/commons-configuration = 1.6
-/commons-digester/commons-digester = 2.0
-/commons-fileupload/commons-fileupload = 1.2.1
-/commons-io/commons-io = 2.1
+/commons-configuration/commons-configuration = 1.10
+/commons-digester/commons-digester = 2.1
+/commons-fileupload/commons-fileupload = 1.3
+/commons-io/commons-io = 2.4
 /commons-lang/commons-lang = 2.6
-/commons-logging/commons-logging = 1.1.1
+/commons-logging/commons-logging = 1.1.3
 /de.l3s.boilerpipe/boilerpipe = 1.1.0
 /dom4j/dom4j = 1.6.1
 /edu.ucar/netcdf = 4.2-min
{code}

I'm not advocating that we upgrade all the components at once, but it looks 
like we can indeed upgrade them all eventually.  I only ran the basic tests, so 
additional tests (nightly, weekly, etc) need to be done.


 Upgrade Apache Commons Codec to version 1.9 in order to improve performance 
 of BeiderMorseFilter
 

 Key: SOLR-5613
 URL: https://issues.apache.org/jira/browse/SOLR-5613
 Project: Solr
  Issue Type: Improvement
  Components: Rules, Schema and Analysis, search
Affects Versions: 3.6, 3.6.1, 3.6.2, 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 
 4.4, 4.5, 4.5.1, 4.6
Reporter: Thomas Champagne
  Labels: codec, commons, commons-codec, phonetic, search

 In version 1.9 of commons-codec project, there are a lot of optimizations in 
 the Beider Morse encoder. This is used by the BeiderMorseFilter in Solr. 
 Do you think it is possible to upgrade this dependency ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-01-07 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5463:
---

Description: 
I'd like to revist a solution to the problem of deep paging in Solr, 
leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at 
the lucene level: require the clients to provide back a token indicating the 
sort values of the last document seen on the previous page.  This is similar 
to the cursor model I've seen in several other REST APIs that support 
pagnation over a large sets of results (notable the twitter API and it's 
since_id param) except that we'll want something that works with arbitrary 
multi-level sort critera that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while 
ago, but the key bit of argument parsing to leverage it was commented out due 
to some problems (see comments in that issue).  It's also somewhat out of date 
at this point: at the time it was commited, IndexSearcher only supported 
searchAfter for simple scores, not arbitrary field sorts; and the params added 
in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on 
ensuring that we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode

{panel:title=Basic Usage}
* send a request with {{sort=Xstart=0rows=NcursorMark=*}}
** sort can be anything, but must include the uniqueKey field (as a tie 
breaker) 
** N can be any number you want per page
** start must be 0
** \* denotes you want to use a cursor starting at the beginning mark
* parse the response body and extract the (String) {{nextCursorMark}} value
* Replace the \* value in your initial request params with the 
{{nextCursorMark}} value from the response in the subsequent request
* repeat until the {{nextCursorMark}} value stops changing, or you have 
collected as many docs as you need
{panel}


  was:
I'd like to revist a solution to the problem of deep paging in Solr, 
leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at 
the lucene level: require the clients to provide back a token indicating the 
sort values of the last document seen on the previous page.  This is similar 
to the cursor model I've seen in several other REST APIs that support 
pagnation over a large sets of results (notable the twitter API and it's 
since_id param) except that we'll want something that works with arbitrary 
multi-level sort critera that can be either ascending or descending.

SOLR-1726 laid some initial ground work here and was commited quite a while 
ago, but the key bit of argument parsing to leverage it was commented out due 
to some problems (see comments in that issue).  It's also somewhat out of date 
at this point: at the time it was commited, IndexSearcher only supported 
searchAfter for simple scores, not arbitrary field sorts; and the params added 
in SOLR-1726 suffer from this limitation as well.

---

I think it would make sense to start fresh with a new issue with a focus on 
ensuring that we have deep paging which:

* supports arbitrary field sorts in addition to sorting by score
* works in distributed mode



 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0

 Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but 

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got cluster state from ZK and successfully 
found the leader, so there is actually a leader at this point contrary to what 
it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down
{code}


 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for 

[jira] [Comment Edited] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar edited comment on SOLR-5615 at 1/7/14 5:02 PM:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got leader props from ZK successfully, so there 
is actually a leader at this point contrary to what it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down
{code}



was (Author: andyetitmoves):
Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 

[jira] [Created] (SOLR-5616) Make grouping code use response builder needDocList

2014-01-07 Thread Steven Bower (JIRA)
Steven Bower created SOLR-5616:
--

 Summary: Make grouping code use response builder needDocList
 Key: SOLR-5616
 URL: https://issues.apache.org/jira/browse/SOLR-5616
 Project: Solr
  Issue Type: Bug
Reporter: Steven Bower


Right now the grouping code does this to check if it needs to generate a 
docList for grouped results:

{code}
if (rb.doHighlights || rb.isDebug() || params.getBool(MoreLikeThisParams.MLT, 
false) ){
...
}
{code}

this is ugly because any new component that needs a doclist, from grouped 
results, will need to modify QueryComponent to add a check to this if. Ideally 
this should just use the rb.isNeedDocList() flag...

Coincidentally this boolean is really never used at for non-grouped results it 
always gets generated..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864389#comment-13864389
 ] 

Ramkumar Aiyengar edited comment on SOLR-5615 at 1/7/14 5:04 PM:
-

Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover relationship with ZooKeeper...

// ..

2014-01-06 06:22:12,529 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:103] Connection with ZooKeeper reestablished.

// ..

2014-01-06 06:22:36,573 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:989] publishing core=collection_20131120_shard205_replica2 
state=down

// ..

2014-01-06 06:28:01,479 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:199] Updating cluster state from ZooKeeper... 
2014-01-06 06:28:01,487 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:651] Register node as live in 
ZooKeeper:/live_nodes/host5:10750_solr

// See trace above, it directly got leader props from ZK successfully, so there 
is actually a leader at this point contrary to what it finds below

2014-01-06 06:28:01,567 INFO [main-EventThread] o.a.s.c.c.SolrZkClient 
[SolrZkClient.java:378] makePath: /live_nodes/host5:10750_solr
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard241_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard241
2014-01-06 06:28:01,669 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// nothing much after this on main-EventThread for 20 mins..

2014-01-06 06:54:01,786 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard241

// Then goes on to the next replica ..

2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:757] Register replica - 
core:collection_20131120_shard209_replica2 address:http://host5:10750/solr 
collection:collection_20131120 shard:shard209
2014-01-06 06:54:01,786 INFO [main-EventThread] o.a.s.c.s.i.HttpClientUtil 
[HttpClientUtil.java:103] Creating new http client, 
config:maxConnections=1maxConnectionsPerHost=20connTimeout=3socketTimeout=3retry=false

// waits another twenty mins (by which time I ordered a shutdown, so things 
started erroring out sooner after that)

2014-01-06 07:19:21,656 ERROR [main-EventThread] o.a.s.c.ZkController 
[ZkController.java:869] Error getting leader from zk
org.apache.solr.common.SolrException: No registered leader was found, 
collection:collection_20131120 slice:shard209

// After trying to register all other replicas, these failed fast because we 
had ordered a shutdown already..

2014-01-06 07:19:21,693 INFO [main-EventThread] 
o.a.s.c.c.DefaultConnectionStrategy [DefaultConnectionStrategy.java:48] 
Reconnected to ZooKeeper
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:130] Connected:true

// And immediately, *now* it fires all the events it was waiting for!

2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@2467da0a 
name:ZooKeeperConnection Watcher:host1:11600,host2:11600,host3:11600 got event 
WatchedEvent state:Disconnected type:None path:null path:null type:None
2014-01-06 07:19:21,693 INFO [main-EventThread] o.a.z.ClientCnxn 
[ClientCnxn.java:509] EventThread shut down

// many more such disc events, and then the watches

2014-01-06 07:19:21,694 WARN [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:281] ZooKeeper watch triggered, but Solr cannot talk to ZK
2014-01-06 07:19:21,694 INFO [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:210] A cluster state change: WatchedEvent 
state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred 
- updating... (live nodes size: 112)
2014-01-06 07:19:21,694 WARN [main-EventThread] o.a.s.c.c.ZkStateReader 
[ZkStateReader.java:234] ZooKeeper watch triggered, but Solr cannot talk to ZK

{code}



was (Author: andyetitmoves):
Here's some log trace which actually happened, might help understand the 
scenario above..

{code}
2014-01-06 06:22:03,867 INFO [main-EventThread] o.a.s.c.c.ConnectionManager 
[ConnectionManager.java:88] Our previous ZooKeeper session was expired. 
Attempting to reconnect to recover 

[jira] [Updated] (SOLR-5616) Make grouping code use response builder needDocList

2014-01-07 Thread Steven Bower (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Bower updated SOLR-5616:
---

Attachment: SOLR-5616.patch

Here is a patch that makes this change. It's against trunk but should easily 
patch onto older versions. Ideally this would get onto a 4.x release..

 Make grouping code use response builder needDocList
 ---

 Key: SOLR-5616
 URL: https://issues.apache.org/jira/browse/SOLR-5616
 Project: Solr
  Issue Type: Bug
Reporter: Steven Bower
 Attachments: SOLR-5616.patch


 Right now the grouping code does this to check if it needs to generate a 
 docList for grouped results:
 {code}
 if (rb.doHighlights || rb.isDebug() || params.getBool(MoreLikeThisParams.MLT, 
 false) ){
 ...
 }
 {code}
 this is ugly because any new component that needs a doclist, from grouped 
 results, will need to modify QueryComponent to add a check to this if. 
 Ideally this should just use the rb.isNeedDocList() flag...
 Coincidentally this boolean is really never used at for non-grouped results 
 it always gets generated..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864401#comment-13864401
 ] 

Mark Miller commented on SOLR-5615:
---

Thanks, perfect.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-07 Thread Nolan Lawson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864402#comment-13864402
 ] 

Nolan Lawson commented on SOLR-5379:


[~markus17]: They're boosted equally.  It was the subject of [a 
bug|https://github.com/healthonnet/hon-lucene-synonyms/issues/31].

[~iorixxx]: I just tested it out now.  I got:

{code}
(+(DisjunctionMaxQuery((text:president usa~5)) 
(((+DisjunctionMaxQuery((text:president united states of 
america~5)))/no_coord/no_coord // parsedQuery
+((text:president usa~5) ((+(text:president united states of america~5 
// parsedQuery.toString()
{code}

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864434#comment-13864434
 ] 

Mark Miller commented on SOLR-5615:
---

Okay, now it's more clear to me. We need to run onReconnect in a background 
thread I think.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864446#comment-13864446
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

That, incidentally, was my first attempt at a fix! (Should have a diff..) 
However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded? It 
should still work as a solution I guess..

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864460#comment-13864460
 ] 

Mark Miller commented on SOLR-5615:
---

bq. However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded?

But it holds the ConnectionManager this lock while it runs right? I think we 
just don't want to hold that lock while it runs. 

I think the other changes are likely okay too - I'm playing around with a 
combination of the two.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Another rev.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Fix Version/s: 4.6.1
   4.7
   5.0
 Assignee: Mark Miller

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Iterating BinaryDocValues

2014-01-07 Thread Mikhail Khludnev
Joel,

I tried to hack it straightforwardly, but found no free gain there. The
only attempt I can suggest is to try to reuse bytes in
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401right
now it allocates bytes every time, which beside of GC can also impact
memory access locality. Could you try fix memory waste and repeat
performance test?

Have a good hack!


On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com wrote:


 Hi,

 I'm looking for a faster way to perform large scale docId - bytesRef
 lookups for BinaryDocValues.

 I'm finding that I can't get the performance that I need from the random
 access seek in the BinaryDocValues interface.

 I'm wondering if sequentially scanning the docValues would be a faster
 approach. I have a BitSet of matching docs, so if I sequentially moved
 through the docValues I could test each one against that bitset.

 Wondering if that approach would be faster for bulk extracts and how
 tricky it would be to add an iterator to the BinaryDocValues interface?

 Thanks,
 Joel




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


[jira] [Resolved] (SOLR-5614) Boost documents using map and query functions

2014-01-07 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5614.


Resolution: Invalid

please don't file a bug just because you've been waiting 24 hours for an answer 
to a question on the solr-user mailing list - sometimes it takes longer then 
that for people to answer.

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201312.mbox/%3c52c17579.30...@kelkoo.com%3E

 Boost documents using map and query functions
 -

 Key: SOLR-5614
 URL: https://issues.apache.org/jira/browse/SOLR-5614
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Anca Kopetz

 We want to boost documents that contain specific search terms in its fields. 
 We tried the following simplified query : 
 http://localhost:8983/solr/collection1/select?q=ipod 
 belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power
 And we get the following error : 
 org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
 'power'
 And the stacktrace :
 ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
 org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
 Infinite Recursion detected parsing query 'power'
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
 parsing query 'power'
 at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
 at org.apache.solr.search.QParser.subQuery(QParser.java:200)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
 at org.apache.solr.search.QParser.getQuery(QParser.java:142)
 

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864475#comment-13864475
 ] 

Mark Miller commented on SOLR-5615:
---

Even with the other changes, I like the idea of using a background thread 
because I don't think it's right that we do that whole reconnect process before 
we set that we are connected to zk and get out of the connection manager. I 
really don't think that process should hold up the connection manager at all - 
it's meant to just trigger it.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-5617:
--

 Summary: Default classloader restrictions may be too tight
 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
 Fix For: 5.0, 4.7


SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in ${solr.solr.home}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but 
${solr.solr.home} and its children should be about as trustworthy as 
instanceDir.

Ideally I'd like to have ${solr.solr.home}/lib trusted automatically, since it 
is searched automatically.  If I need to define a system property to make this 
happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864491#comment-13864491
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Fair enough. Would that allow multiple onReconnect.command () invocations to 
run simultaneously, and is that fine? (on mobile, so my reading of the patch 
could be wrong) What if we were in the process of recovering when we were 
unfortunate enough to get a second expiry thereby bringing all nodes down?

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864496#comment-13864496
 ] 

Mikhail Khludnev commented on SOLR-5244:


bq. 1) Add a special cache that speeds up the docId- bytesRef lookup. This 
would be a segment level cache of the top N terms (by frequency) in the index. 
The cache would be a simple int to BytesRef hashmap, mapping the segment level 
ord to the bytesRef

that's exactly what you've got on FieldCache.DEFAULT.getTerms() for an indexed 
field without docvalues enabled. See. FieldCacheImpl.BinaryDocValuesCache

 Full Search Result Export
 -

 Key: SOLR-5244
 URL: https://issues.apache.org/jira/browse/SOLR-5244
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5244.patch


 It would be great if Solr could efficiently export entire search result sets 
 without scoring or ranking documents. This would allow external systems to 
 perform rapid bulk imports from Solr. It also provides a possible platform 
 for exporting results to support distributed join scenarios within Solr.
 This ticket provides a patch that has two pluggable components:
 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
 document results and does not delegate to ranking collectors. Instead it puts 
 the BitSet on the request context.
 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
 the entire result as a binary stream. A header is provided at the beginning 
 of the stream so external clients can self configure.
 Note:
 These two components will be sufficient for a non-distributed environment. 
 For distributed export a new Request handler will need to be developed.
 After applying the patch and building the dist or example, you can register 
 the components through the following changes to solrconfig.xml
 Register export contrib libraries:
 lib dir=../../../dist/ regex=solr-export-\d.*\.jar /
  
 Register the export queryParser with the following line:
  
 queryParser name=export 
 class=org.apache.solr.export.ExportQParserPlugin/
  
 Register the xbin writer:
  
 queryResponseWriter name=xbin 
 class=org.apache.solr.export.BinaryExportWriter/
  
 The following query will perform the export:
 {code}
 http://localhost:8983/solr/collection1/select?q=*:*fq={!export}wt=xbinfl=join_i
 {code}
 Initial patch supports export of four data-types:
 1) Single value trie int, long and float
 2) Binary doc values.
 The numerics are currently exported from the FieldCache and the Binary doc 
 values can be in memory or on disk.
 Since this is designed to export very large result sets efficiently, stored 
 fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Description: 
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
it is searched automatically.  If I need to define a system property to make 
this happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.

  was:
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in ${solr.solr.home}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but 
${solr.solr.home} and its children should be about as trustworthy as 
instanceDir.

Ideally I'd like to have ${solr.solr.home}/lib trusted automatically, since it 
is searched automatically.  If I need to define a system property to make this 
happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.


 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
 order to get those jars to work, I must turn off all SOLR-4882 safety 
 checking.
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
 it is searched automatically.  If I need to define a system property to make 
 this happen, I'm OK with that -- as long as I don't have to turn off the 
 safety checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864502#comment-13864502
 ] 

Mark Miller commented on SOLR-5615:
---

Yeah, I've been considered the same thing. My inclination was it was okay, but 
we may have to add something to cancel our leader election before joining the 
election to be sure.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864505#comment-13864505
 ] 

Shawn Heisey commented on SOLR-5617:


I will have to double-check, but I probably have the specifics that required me 
to turn off the safety checking wrong.  It may have been configuration 
components gathered via xinclude, not jarfiles.  Either way, I am sure that 
everything is under the solr home.


 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
 order to get those jars to work, I must turn off all SOLR-4882 safety 
 checking.
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
 it is searched automatically.  If I need to define a system property to make 
 this happen, I'm OK with that -- as long as I don't have to turn off the 
 safety checking entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5611) When documents are uniformly distributed over shards, enable returning approximated results in distributed query

2014-01-07 Thread Isaac Hebsh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isaac Hebsh closed SOLR-5611.
-

Resolution: Not A Problem

Oops. I missed the {{shards.rows}} parameter.

 When documents are uniformly distributed over shards, enable returning 
 approximated results in distributed query
 

 Key: SOLR-5611
 URL: https://issues.apache.org/jira/browse/SOLR-5611
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Isaac Hebsh
  Labels: distributed_search, shard, solrcloud
 Fix For: 4.7


 Query with rows=1000, which sent to a collection of 100 shards (shard key 
 behaviour is default - based on hash of the unique key), will generate 100 
 requests of rows=1000, on each shard.
 This results to total number of rows*numShards unique keys to be retrieved. 
 This behaviour is getting worst as numShards grows.
 If the documents are uniformly distributed over the shards, the expected 
 number of document should be ~ rows/numShards. Obviously, there might be 
 extreme cases, when all of the top X documents are in a specific shard.
 I suggest adding an optional parameter, say approxResults=true, which decides 
 whether we should limit the rows in the shard requests to rows/numShardsor 
 not. Moreover, we can add a numeric parameter which increases the limit, to 
 be more accurate.
 For example, the query {{approxResults=trueapproxResults.factor=1.5}} will 
 retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and 
 rows=1000, each shard will return 15 documents.
 Furthermore, this can reduce the problem of deep paging, because the same 
 thing can be applied there. when requested start=10, Solr creating shard 
 request with start=0 and rows=START+ROWS. In the approximated approach, start 
 parameter (in the shard requests) can be set to 10/numShards. The idea of 
 the approxResults.factor creates some difficulties here, though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5560) Enable LocalParams without escaping the query

2014-01-07 Thread Isaac Hebsh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864553#comment-13864553
 ] 

Isaac Hebsh commented on SOLR-5560:
---

Hi [~ryancutter], thank you a lot!
I'm not familiar with parser states (thank god), so I can't review the patch.

What action is should be performed in order to commit this patch? (into 4.7?)

 Enable LocalParams without escaping the query
 -

 Key: SOLR-5560
 URL: https://issues.apache.org/jira/browse/SOLR-5560
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.6
Reporter: Isaac Hebsh
 Fix For: 4.7, 4.6.1

 Attachments: SOLR-5560.patch


 This query should be a legit syntax:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)
 currently it isn't, because the LocalParams can be specified on a single term 
 only.
 [~billnbell] thinks it is a bug.
 From the mailing list:
 {quote}
 We want to set a LocalParam on a nested query. When quering with v inline 
 parameter, it works fine:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\}
 the parsedquery_toString is
 +id:TERM1 +(text:term2 text:term3 text:term4 term5)
 Query using the _query_ also works fine:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\
 (parsedquery is exactly the same).
 Obviously, there is the option of external parameter ({... 
 v=$nestedq}nestedq=...)
 This is a good solution, but it is not practical, when having a lot of such 
 nested queries.
 BUT, when trying to put the nested query in place, it yields syntax error:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)
 org.apache.solr.search.SyntaxError: Cannot parse '(TERM2'
 The previous options are less preferred, because the escaping that should be 
 made on the nested query.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-07 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5615:
--

Attachment: SOLR-5615.patch

Another rev that adds what I think is a decent change anyway - before joining 
an election, cancel any known previous election participation.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings

2014-01-07 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864579#comment-13864579
 ] 

Hoss Man commented on SOLR-5594:


* Aren't there other parsers classes that will need similar changes? 
(PrefixQParserPlugin, SimplerQParserPlugin at a minimum i think)
* I think your new FieldType.getPrefixQuery method has a back compat break for 
any existing FieldTypes that people might be using because it now calls 
readableToIndexed ... that smells like it could break things for some 
FieldTypes ... but maybe i'm missing something?
* FieldType.getPrefixQuery has lots of bogus cut/pasted javadocs from 
getRangeQuery
* Can't your MyIndexedBinaryField just subclass BinaryField to reduce some 
code?  for that matter: is there any reason why we shouldn't just make 
BinaryField implement prefix queries in the way your MyIndexedBinaryField does?
* i'm not sure i understand why you need BinaryTokenStream for the test (see 
previous comment about just extending/improving BinaryField) but if so perhaps 
it should be moved from lucene/core to lucene/test-framework?

 Enable using extended field types with prefix queries for non-default encoded 
 strings
 -

 Key: SOLR-5594
 URL: https://issues.apache.org/jira/browse/SOLR-5594
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, Schema and Analysis
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Priority: Minor
 Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch


 Enable users to be able to use prefix query with custom field types with 
 non-default encoding/decoding for queries more easily. e.g. having a custom 
 field work with base64 encoded query strings.
 Currently, the workaround for it is to have the override at getRewriteMethod 
 level. Perhaps having the prefixQuery also use the calling FieldType's 
 readableToIndexed method would work better.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5361) FVH throws away some boosts

2014-01-07 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864583#comment-13864583
 ] 

Adrien Grand commented on LUCENE-5361:
--

Thanks Nik, your fix looks good! I don't think cloning the queries is an issue, 
it happens all the time when doing rewrites, and it's definitely better than 
modifying those queries in-place.

I'll commit it tomorrow if there is no objection.

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Pull requests versus JIRAta

2014-01-07 Thread Benson Margulies
Further adventures in token streams have motivated me to play tech
writer some more.

Options:

1. just create github pull requests.
2. reopen prior jira
3. make new jira

preference?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Pull requests versus JIRAta

2014-01-07 Thread Robert Muir
I think 1 or 3 is best.

The downside of 2 is just the confusion, since the other doc was good,
i dont think we have to reopen it.

i cant imagine anyone worried about having too many jiras with
documentation fixes!

On Tue, Jan 7, 2014 at 3:21 PM, Benson Margulies bimargul...@gmail.com wrote:
 Further adventures in token streams have motivated me to play tech
 writer some more.

 Options:

 1. just create github pull requests.
 2. reopen prior jira
 3. make new jira

 preference?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Pull requests versus JIRAta

2014-01-07 Thread Benson Margulies
OK. Hopefully this time I'll remember to watch my own JIRA so that I
don't ignore Uwe.

On Tue, Jan 7, 2014 at 3:24 PM, Robert Muir rcm...@gmail.com wrote:
 I think 1 or 3 is best.

 The downside of 2 is just the confusion, since the other doc was good,
 i dont think we have to reopen it.

 i cant imagine anyone worried about having too many jiras with
 documentation fixes!

 On Tue, Jan 7, 2014 at 3:21 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 Further adventures in token streams have motivated me to play tech
 writer some more.

 Options:

 1. just create github pull requests.
 2. reopen prior jira
 3. make new jira

 preference?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-07 Thread Lajos
I've followed this thread with interest, and although I'm (sadly) a 
lapsed Apache committer (not Lucene/Solr), I finally had to comment as 
I've just gone through the pain of learning git after many happy years 
with svn.


In my long experience in IT I've learned one incontrovertible fact: most 
times, the technical merits of one technology over another are not 
nearly as important as everyone thinks. It is really all about how WELL 
you use a given technology to get the job done. The stuff I do in git 
now, I could do in SVN, and vice versa. I'd wager I could do the same in 
CVS or even older technologies. It like Ant versus Maven versus Gradle. 
I can do the same in each of these. Each has their own good and bad 
points. I'll stick with Ant and SVN to the end but hey, if a client 
works only with Gradle and Git and XYZ technology and has an 
intellectual investment there, I'm not gonna argue the point on 
technical merits.


That being said, I think the worst argument one could make about 
anything is that we should move to it because everyone else is. People 
will flock to fads as much (I could argue: more) than to genuine 
technical improvements (anyone remember the 70s? 80s? 90s?). Git feels a 
bit faddish to me, and is definitely immature. I get some of the 
advantages, but I don't think I should have to be a gitk expert to use 
the damn software - its over-engineered and actually opens up the door 
to more convoluted development processes.


Whether Git is a fad or not, the issue, as pointed out below, is 
supporting the way contributors work. The win-win situation would be to 
keep the core based on SVN but support git contributions (as I know 
someone else suggested). SVN is a technology that is stable and which 
all core committers know like the back of their hands - no sense in 
wasting time learning git when people are donating time and that time is 
better spent on JIRAs. What I don't know is how this GIT integration 
would work, but I'd hope it could be done.


Just to push home the point, I'll bet most of us who have been around a 
while have plenty of stories of IT shops moving from one technology to 
another ... and then in a few years to another ... and then to another - 
all because some manager got a burr up his rear or was wined and dined 
by a vendor. Why? Why hurt productivity for the sake of keep up with the 
times? How about setting an example of sticking with what works despite 
the made rush to github?


My €.02.

Lajos Moczar




On 06/01/2014 17:01, Robert Muir wrote:

On Sun, Jan 5, 2014 at 12:07 PM, Mark Miller markrmil...@gmail.com wrote:

My point here is not really to discuss the merits of Git VS SVN on a feature
/ interface basis. We might as well talk about MySQL vs Postgres.

Personally, I prefer GIT. It feels good when I use it. SVN feels like crap.
That doesn't make me want to move. I've used SVN for years with Lucene/Solr,
and like everyone, it's pretty much second nature.

The problem is the world is moving. It may not be clear to everyone yet, but
give it a bit more time and it will be.

Git already owns the open source world. It rivals SVN by most guesses in the
proprietary world. This is a strong hard trend. The same trend that saw SVN
eat CVS. I think clearly, a distributed version control system will
dominate. And clearly Git has won.

I'm not ready to call a vote, because I don't think it's critical we switch
yet. But I wanted to continue the discussion, as obviously, plenty of it
will be needed over time before we made such a switch.

It's not about one thing being better than the other. It's about using what
everyone else uses so you don't provide a barrier to contribution. It's
about the post I linked to when I started this thread.

I personally don't care about pull requests and Github. I don't think any of
it's features are that great, other than it acts as a central repo. Git is
not good because of Github IMO. But Git and Github are eating the world.

Most of the patches I have processed now are made against Git. Jumping from
SVN to Git and back is very annoying IMO though. There are plenty of tools
and workflows for it and they all suck.

Anyway, as the trend continues, it will become even more obvious that
Lucene/Solr will start looking stale on SVN. We have enough image problems
in terms of being modern at Apache. We will need to manage the ones we can.

We should not choose the tools that simply make us fuzzy and comfortable.
We should choose the tools that are best for the project and future
contributions in the long term.

- Mark




The idea that this has anything to do with contributors is misleading.

Today contributors can use either SVN or GIT. They have their choice.
How can it be any better than that for contributors?

As demonstrated over the weekend, its also possible today for
contributors to use svn+jira or git+pull request workflow.

As i said earlier, why not spend our time trying to make it easier on
contributors and support 

Re: Iterating BinaryDocValues

2014-01-07 Thread Michael McCandless
Going sequentially should help, if the pages are not hot (in the OS's IO cache).

You can also use a different DVFormat, e.g. Direct, but this holds all
bytes in RAM.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Joel,

 I tried to hack it straightforwardly, but found no free gain there. The only
 attempt I can suggest is to try to reuse bytes in
 https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
 right now it allocates bytes every time, which beside of GC can also impact
 memory access locality. Could you try fix memory waste and repeat
 performance test?

 Have a good hack!


 On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com wrote:


 Hi,

 I'm looking for a faster way to perform large scale docId - bytesRef
 lookups for BinaryDocValues.

 I'm finding that I can't get the performance that I need from the random
 access seek in the BinaryDocValues interface.

 I'm wondering if sequentially scanning the docValues would be a faster
 approach. I have a BitSet of matching docs, so if I sequentially moved
 through the docValues I could test each one against that bitset.

 Wondering if that approach would be faster for bulk extracts and how
 tricky it would be to add an iterator to the BinaryDocValues interface?

 Thanks,
 Joel




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864683#comment-13864683
 ] 

Michael McCandless commented on LUCENE-5354:


Woops, sorry, this fell below the event horizon of my TODO list.  I'll look at 
your new patch soon.

There is an existing performance test, LookupBenchmarkTest, but it's a bit 
tricky to run.  See the comment on LUCENE-5030: 
https://issues.apache.org/jira/browse/LUCENE-5030?focusedCommentId=13689155page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13689155

 Blended score in AnalyzingInfixSuggester
 

 Key: LUCENE-5354
 URL: https://issues.apache.org/jira/browse/LUCENE-5354
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 4.4
Reporter: Remi Melisson
Priority: Minor
  Labels: suggester
 Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch


 I'm working on a custom suggester derived from the AnalyzingInfix. I require 
 what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) 
 to transform the suggestion weights depending on the position of the searched 
 term(s) in the text.
 Right now, I'm using an easy solution :
 If I want 10 suggestions, then I search against the current ordered index for 
 the 100 first results and transform the weight :
 bq. a) by using the term position in the text (found with TermVector and 
 DocsAndPositionsEnum)
 or
 bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
 searching
 and return the updated 10 most weighted suggestions.
 Since we usually don't need to suggest so many things, the bigger search + 
 rescoring overhead is not so significant but I agree that this is not the 
 most elegant solution.
 We could include this factor (here the position of the term) directly into 
 the index.
 So, I can contribute to this if you think it's worth adding it.
 Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
 dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864689#comment-13864689
 ] 

Joel Bernstein commented on SOLR-5244:
--

I'll do some testing of the performance of this. Unless I'm missing something 
though, it looks like you have go through a PagedBytes.Reader, 
PackedInts.Reader to get the BytesRef. I think would perform with similar 
performance to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc 
IntObjectOpenHashMap, which I should been able to do 10 million+ read 
operations per second.

 Full Search Result Export
 -

 Key: SOLR-5244
 URL: https://issues.apache.org/jira/browse/SOLR-5244
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5244.patch


 It would be great if Solr could efficiently export entire search result sets 
 without scoring or ranking documents. This would allow external systems to 
 perform rapid bulk imports from Solr. It also provides a possible platform 
 for exporting results to support distributed join scenarios within Solr.
 This ticket provides a patch that has two pluggable components:
 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
 document results and does not delegate to ranking collectors. Instead it puts 
 the BitSet on the request context.
 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
 the entire result as a binary stream. A header is provided at the beginning 
 of the stream so external clients can self configure.
 Note:
 These two components will be sufficient for a non-distributed environment. 
 For distributed export a new Request handler will need to be developed.
 After applying the patch and building the dist or example, you can register 
 the components through the following changes to solrconfig.xml
 Register export contrib libraries:
 lib dir=../../../dist/ regex=solr-export-\d.*\.jar /
  
 Register the export queryParser with the following line:
  
 queryParser name=export 
 class=org.apache.solr.export.ExportQParserPlugin/
  
 Register the xbin writer:
  
 queryResponseWriter name=xbin 
 class=org.apache.solr.export.BinaryExportWriter/
  
 The following query will perform the export:
 {code}
 http://localhost:8983/solr/collection1/select?q=*:*fq={!export}wt=xbinfl=join_i
 {code}
 Initial patch supports export of four data-types:
 1) Single value trie int, long and float
 2) Binary doc values.
 The numerics are currently exported from the FieldCache and the Binary doc 
 values can be in memory or on disk.
 Since this is designed to export very large result sets efficiently, stored 
 fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-07 Thread Mark Miller
I don’t really buy the fad argument, but as I’ve said, I’m willing to wait a 
little longer for others to catch on. I try and follow the stats and reports 
and articles on this pretty closely.

As I mentioned early in the thread, by all appearances, the shift from SVN to 
GIT looks much like the shift from CVS to SVN. This was not a fad change, nor 
is the next mass movement likely to be.

Just like no one starts a project on CVS anymore, we are almost already to the 
point where new projects start exclusive on GIT - especially open source.

I’m happy to sit back and watch the trend continue though. The number of GIT 
users in the committee and among the committers only grows every time the 
discussion comes up.

If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad 
argument. But it just doesn’t jive in 2014.

- Mark

On Jan 7, 2014, at 3:33 PM, Lajos la...@protulae.com wrote:

 I've followed this thread with interest, and although I'm (sadly) a lapsed 
 Apache committer (not Lucene/Solr), I finally had to comment as I've just 
 gone through the pain of learning git after many happy years with svn.
 
 In my long experience in IT I've learned one incontrovertible fact: most 
 times, the technical merits of one technology over another are not nearly as 
 important as everyone thinks. It is really all about how WELL you use a given 
 technology to get the job done. The stuff I do in git now, I could do in SVN, 
 and vice versa. I'd wager I could do the same in CVS or even older 
 technologies. It like Ant versus Maven versus Gradle. I can do the same in 
 each of these. Each has their own good and bad points. I'll stick with Ant 
 and SVN to the end but hey, if a client works only with Gradle and Git and 
 XYZ technology and has an intellectual investment there, I'm not gonna argue 
 the point on technical merits.
 
 That being said, I think the worst argument one could make about anything is 
 that we should move to it because everyone else is. People will flock to 
 fads as much (I could argue: more) than to genuine technical improvements 
 (anyone remember the 70s? 80s? 90s?). Git feels a bit faddish to me, and is 
 definitely immature. I get some of the advantages, but I don't think I should 
 have to be a gitk expert to use the damn software - its over-engineered and 
 actually opens up the door to more convoluted development processes.
 
 Whether Git is a fad or not, the issue, as pointed out below, is supporting 
 the way contributors work. The win-win situation would be to keep the core 
 based on SVN but support git contributions (as I know someone else 
 suggested). SVN is a technology that is stable and which all core committers 
 know like the back of their hands - no sense in wasting time learning git 
 when people are donating time and that time is better spent on JIRAs. What I 
 don't know is how this GIT integration would work, but I'd hope it could be 
 done.
 
 Just to push home the point, I'll bet most of us who have been around a while 
 have plenty of stories of IT shops moving from one technology to another ... 
 and then in a few years to another ... and then to another - all because some 
 manager got a burr up his rear or was wined and dined by a vendor. Why? Why 
 hurt productivity for the sake of keep up with the times? How about setting 
 an example of sticking with what works despite the made rush to github?
 
 My €.02.
 
 Lajos Moczar
 
 
 
 
 On 06/01/2014 17:01, Robert Muir wrote:
 On Sun, Jan 5, 2014 at 12:07 PM, Mark Miller markrmil...@gmail.com wrote:
 My point here is not really to discuss the merits of Git VS SVN on a feature
 / interface basis. We might as well talk about MySQL vs Postgres.
 
 Personally, I prefer GIT. It feels good when I use it. SVN feels like crap.
 That doesn't make me want to move. I've used SVN for years with Lucene/Solr,
 and like everyone, it's pretty much second nature.
 
 The problem is the world is moving. It may not be clear to everyone yet, but
 give it a bit more time and it will be.
 
 Git already owns the open source world. It rivals SVN by most guesses in the
 proprietary world. This is a strong hard trend. The same trend that saw SVN
 eat CVS. I think clearly, a distributed version control system will
 dominate. And clearly Git has won.
 
 I'm not ready to call a vote, because I don't think it's critical we switch
 yet. But I wanted to continue the discussion, as obviously, plenty of it
 will be needed over time before we made such a switch.
 
 It's not about one thing being better than the other. It's about using what
 everyone else uses so you don't provide a barrier to contribution. It's
 about the post I linked to when I started this thread.
 
 I personally don't care about pull requests and Github. I don't think any of
 it's features are that great, other than it acts as a central repo. Git is
 not good because of Github IMO. But Git and Github are eating the world.
 
 Most of the patches I have processed now are 

[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Description: 
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but what if you have common resources like included config files that are 
outside instanceDir but are still fully inside the solr home?

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
automatically.  If I need to define a system property to make this happen, I'm 
OK with that -- as long as I don't have to turn off the safety checking 
entirely.

  was:
SOLR-4882 introduced restrictions for the Solr class loader that cause 
resources outside the instanceDir to fail to load.  This is a very good goal, 
but it also causes resources in $\{solr.solr.home\}/lib to fail to load.  In 
order to get those jars to work, I must turn off all SOLR-4882 safety checking.

I can understand not wanting to load resources from an arbitrary path, but the 
solr home and its children should be about as trustworthy as instanceDir.

Ideally I'd like to have $\{solr.solr.home\}/lib trusted automatically, since 
it is searched automatically.  If I need to define a system property to make 
this happen, I'm OK with that -- as long as I don't have to turn off the safety 
checking entirely.


 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but what if you have common resources like included config files that are 
 outside instanceDir but are still fully inside the solr home?
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
 automatically.  If I need to define a system property to make this happen, 
 I'm OK with that -- as long as I don't have to turn off the safety checking 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864505#comment-13864505
 ] 

Shawn Heisey edited comment on SOLR-5617 at 1/7/14 9:44 PM:


Here's a stacktrace from my attempted start on 4.6.0 without the option to 
allow unsafe resource loading.  The solr home is /index/solr4:

{noformat}
ERROR - 2014-01-07 14:37:05.493; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: SolrCore 's1build' is not available 
due to init failure: Could not load config file 
/index/solr4/cores/s1_0/solrconfig.xml
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:825)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:293)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1476)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Could not load config file 
/index/solr4/cores/s1_0/solrconfig.xml
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:532)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:599)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:245)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: org.apache.solr.common.SolrException: org.xml.sax.SAXParseException; 
systemId: solrres:/solrconfig.xml; lineNumber: 7; columnNumber: 70; An include 
with href '../../../config/common/luceneMatchVersion.xml'failed, and no 
fallback element was found.
at org.apache.solr.core.Config.init(Config.java:148)
at org.apache.solr.core.Config.init(Config.java:86)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:129)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:529)
... 11 more
Caused by: org.xml.sax.SAXParseException; systemId: solrres:/solrconfig.xml; 
lineNumber: 7; columnNumber: 70; An include with href 
'../../../config/common/luceneMatchVersion.xml'failed, 

[jira] [Created] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-07 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-5388:


 Summary: Eliminate construction over readers for Tokenizer
 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies


In the modern world, Tokenizers are intended to be reusable, with input 
supplied via #setReader. The constructors that take Reader are a vestige. Worse 
yet, they invite people to make mistakes in handling the reader that tangle 
them up with the state machine in Tokenizer. The sensible thing is to eliminate 
these ctors, and force setReader usage.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5389) Even more doc for construction of TokenStream components

2014-01-07 Thread Benson Margulies (JIRA)
Benson Margulies created LUCENE-5389:


 Summary: Even more doc for construction of TokenStream components
 Key: LUCENE-5389
 URL: https://issues.apache.org/jira/browse/LUCENE-5389
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Benson Margulies


There are more useful things to tell would-be authors of tokenizers. Let's tell 
them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-5170:
--

Attachment: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt

Adds recipDistance scoring, lat/long is one param.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864738#comment-13864738
 ] 

Jeff Wartes commented on SOLR-5170:
---

I've been using this patch with some minor tweaks and solr 4.3.1 in production 
for about six months now. Since I was applying it again against 4.6 this 
morning, I figured I should attach my tweaks, and mention it passes tests 
against 4.6.

This does NOT address the design issues David raises in the initial comment. 
The changes vs the initial patchfile allow it to be applied against a greater 
range of solr versions, and brings it a little closer to feeling the same as 
geofilt's params.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864741#comment-13864741
 ] 

Shawn Heisey commented on SOLR-5617:


I have figured out a workaround.  I've got a config structure that heavily uses 
xinclude and symlinks.  By changing things around so that only the symlinks 
traverse upwards and xinclude only refers to local files, I no longer need to 
enable unsafe loading.

I still think that it would be useful to fix this issue, but the urgency is 
gone.

 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but what if you have common resources like included config files that are 
 outside instanceDir but are still fully inside the solr home?
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
 automatically.  If I need to define a system property to make this happen, 
 I'm OK with that -- as long as I don't have to turn off the safety checking 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5617) Default classloader restrictions may be too tight

2014-01-07 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5617:
---

Priority: Minor  (was: Major)

 Default classloader restrictions may be too tight
 -

 Key: SOLR-5617
 URL: https://issues.apache.org/jira/browse/SOLR-5617
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Shawn Heisey
Priority: Minor
  Labels: security
 Fix For: 5.0, 4.7


 SOLR-4882 introduced restrictions for the Solr class loader that cause 
 resources outside the instanceDir to fail to load.  This is a very good goal, 
 but what if you have common resources like included config files that are 
 outside instanceDir but are still fully inside the solr home?
 I can understand not wanting to load resources from an arbitrary path, but 
 the solr home and its children should be about as trustworthy as instanceDir.
 Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted 
 automatically.  If I need to define a system property to make this happen, 
 I'm OK with that -- as long as I don't have to turn off the safety checking 
 entirely.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864742#comment-13864742
 ] 

Robert Muir commented on LUCENE-5388:
-

+1, its really silly its this way. I guess its the right thing to do this for 
5.0 only: i wish we had done it for 4.0, but it is what it is.

Should be a rather large and noisy change unfortunately. I can help, let me 
know.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5244) Full Search Result Export

2014-01-07 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864689#comment-13864689
 ] 

Joel Bernstein edited comment on SOLR-5244 at 1/7/14 10:12 PM:
---

I'll do some testing of the performance of this. Unless I'm missing something 
though, it looks like you have go through a PagedBytes.Reader, 
PackedInts.Reader to get the BytesRef. I think would have similar performance 
to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc 
IntObjectOpenHashMap, which I should been able to do 10 million+ read 
operations per second.


was (Author: joel.bernstein):
I'll do some testing of the performance of this. Unless I'm missing something 
though, it looks like you have go through a PagedBytes.Reader, 
PackedInts.Reader to get the BytesRef. I think would perform with similar 
performance to the in memory BinaryDocValues I was using for my initial test.

The cache I was thinking of building would be backed by hppc 
IntObjectOpenHashMap, which I should been able to do 10 million+ read 
operations per second.

 Full Search Result Export
 -

 Key: SOLR-5244
 URL: https://issues.apache.org/jira/browse/SOLR-5244
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 5.0
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5244.patch


 It would be great if Solr could efficiently export entire search result sets 
 without scoring or ranking documents. This would allow external systems to 
 perform rapid bulk imports from Solr. It also provides a possible platform 
 for exporting results to support distributed join scenarios within Solr.
 This ticket provides a patch that has two pluggable components:
 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
 document results and does not delegate to ranking collectors. Instead it puts 
 the BitSet on the request context.
 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
 the entire result as a binary stream. A header is provided at the beginning 
 of the stream so external clients can self configure.
 Note:
 These two components will be sufficient for a non-distributed environment. 
 For distributed export a new Request handler will need to be developed.
 After applying the patch and building the dist or example, you can register 
 the components through the following changes to solrconfig.xml
 Register export contrib libraries:
 lib dir=../../../dist/ regex=solr-export-\d.*\.jar /
  
 Register the export queryParser with the following line:
  
 queryParser name=export 
 class=org.apache.solr.export.ExportQParserPlugin/
  
 Register the xbin writer:
  
 queryResponseWriter name=xbin 
 class=org.apache.solr.export.BinaryExportWriter/
  
 The following query will perform the export:
 {code}
 http://localhost:8983/solr/collection1/select?q=*:*fq={!export}wt=xbinfl=join_i
 {code}
 Initial patch supports export of four data-types:
 1) Single value trie int, long and float
 2) Binary doc values.
 The numerics are currently exported from the FieldCache and the Binary doc 
 values can be in memory or on disk.
 Since this is designed to export very large result sets efficiently, stored 
 fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



oom in documentation-lint

2014-01-07 Thread Benson Margulies
Is there a recipe to avoid this?

-documentation-lint:
 [echo] checking for broken html...
[ivy:cachepath] downloading
http://repo1.maven.org/maven2/net/sf/jtidy/jtidy/r938/jtidy-r938.jar
...
[ivy:cachepath]
..
(244kB)
[ivy:cachepath] .. (0kB)
[ivy:cachepath] [SUCCESSFUL ] net.sf.jtidy#jtidy;r938!jtidy.jar (383ms)
[jtidy] Checking for broken html (such as invalid tags)...

BUILD FAILED
/Users/benson/asf/lucene-solr/build.xml:57: The following error
occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/build.xml:208: The following
error occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/build.xml:214: The following
error occurred while executing this line:
/Users/benson/asf/lucene-solr/lucene/common-build.xml:1851:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
at java.io.BufferedWriter.write(BufferedWriter.java:230)
at java.io.PrintWriter.write(PrintWriter.java:456)
at java.io.PrintWriter.write(PrintWriter.java:473)
at java.io.PrintWriter.print(PrintWriter.java:603)
at java.io.PrintWriter.println(PrintWriter.java:739)
at org.w3c.tidy.Report.printMessage(Report.java:754)
at org.w3c.tidy.Report.errorSummary(Report.java:1572)
at org.w3c.tidy.Tidy.parse(Tidy.java:608)
at org.w3c.tidy.Tidy.parse(Tidy.java:263)
at org.w3c.tidy.ant.JTidyTask.processFile(JTidyTask.java:457)
at org.w3c.tidy.ant.JTidyTask.executeSet(JTidyTask.java:420)
at org.w3c.tidy.ant.JTidyTask.execute(JTidyTask.java:364)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)

Total time: 3 minutes 35 seconds

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



lucene-solr pull request: LUCENE-5389: more analysis advice.

2014-01-07 Thread benson-basis
GitHub user benson-basis opened a pull request:

https://github.com/apache/lucene-solr/pull/14

LUCENE-5389: more analysis advice.

Before we change the protocol for tokenizer construction,
let's get plenty of explanation of the existing one, in case
of a 4.7.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benson-basis/lucene-solr 
lucene-5389-more-analysis-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/14.patch


commit 1ddc14c97396183ac99fb9ee5a40bdc09b3994c5
Author: Benson Margulies ben...@basistech.com
Date:   2014-01-07T22:52:11Z

LUCENE-5389: more analysis advice.
Before we change the protocol for tokenizer construction,
let's get plenty of explanation of the existing one, in case
of a 4.7.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components

2014-01-07 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864825#comment-13864825
 ] 

Benson Margulies commented on LUCENE-5389:
--

https://github.com/apache/lucene-solr/pull/14



 Even more doc for construction of TokenStream components
 

 Key: LUCENE-5389
 URL: https://issues.apache.org/jira/browse/LUCENE-5389
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Benson Margulies

 There are more useful things to tell would-be authors of tokenizers. Let's 
 tell them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: oom in documentation-lint

2014-01-07 Thread Robert Muir
The jtidy-macro we use is not very efficient. It just uses the
built-in jtidytask.

I think this is a real problem, last i checked it seemed impossible to
fix without writing a custom task to integrate with jtidy.

we could either disable it, or you could try setting a large Xmx in
ANT_OPTS as a workaround, but I do think we need to fix or disable
this.

On Tue, Jan 7, 2014 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote:
 Is there a recipe to avoid this?

 -documentation-lint:
  [echo] checking for broken html...
 [ivy:cachepath] downloading
 http://repo1.maven.org/maven2/net/sf/jtidy/jtidy/r938/jtidy-r938.jar
 ...
 [ivy:cachepath]
 ..
 (244kB)
 [ivy:cachepath] .. (0kB)
 [ivy:cachepath] [SUCCESSFUL ] net.sf.jtidy#jtidy;r938!jtidy.jar (383ms)
 [jtidy] Checking for broken html (such as invalid tags)...

 BUILD FAILED
 /Users/benson/asf/lucene-solr/build.xml:57: The following error
 occurred while executing this line:
 /Users/benson/asf/lucene-solr/lucene/build.xml:208: The following
 error occurred while executing this line:
 /Users/benson/asf/lucene-solr/lucene/build.xml:214: The following
 error occurred while executing this line:
 /Users/benson/asf/lucene-solr/lucene/common-build.xml:1851:
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
 at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
 at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
 at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
 at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
 at java.io.BufferedWriter.write(BufferedWriter.java:230)
 at java.io.PrintWriter.write(PrintWriter.java:456)
 at java.io.PrintWriter.write(PrintWriter.java:473)
 at java.io.PrintWriter.print(PrintWriter.java:603)
 at java.io.PrintWriter.println(PrintWriter.java:739)
 at org.w3c.tidy.Report.printMessage(Report.java:754)
 at org.w3c.tidy.Report.errorSummary(Report.java:1572)
 at org.w3c.tidy.Tidy.parse(Tidy.java:608)
 at org.w3c.tidy.Tidy.parse(Tidy.java:263)
 at org.w3c.tidy.ant.JTidyTask.processFile(JTidyTask.java:457)
 at org.w3c.tidy.ant.JTidyTask.executeSet(JTidyTask.java:420)
 at org.w3c.tidy.ant.JTidyTask.execute(JTidyTask.java:364)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
 at org.apache.tools.ant.Task.perform(Task.java:348)
 at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
 at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)

 Total time: 3 minutes 35 seconds

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-01-07 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864855#comment-13864855
 ] 

Anshum Gupta commented on SOLR-5477:


bq. in my experience, when implementing an async callback API like this, it can 
be handy to require the client to specify the magical...

Considering that we have a 1-n relationship between calls made by the client to 
the OCP and OCP to Cores, we can't really use the client generated id. We would 
anyways need multiple ids be generated at the OCP-Core call level.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1201 - Failure!

2014-01-07 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1201/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 9939 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140107_235447_516.syserr
   [junit4]  JVM J0: stderr (verbatim) 
   [junit4] java(208,0x149d18000) malloc: *** error for object 0x149d06ad1: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4]  JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java 
-XX:+UseCompressedOops -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=6B057318ACC0851A -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.disableHdfs=true -Dfile.encoding=ISO-8859-1 -classpath 

[jira] [Created] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-07 Thread Hoss Man (JIRA)
Hoss Man created SOLR-5618:
--

 Summary: Reproducible failure from 
TestFiltering.testRandomFiltering
 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


uwe's jenkins found this in java8...

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText

{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
-Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
-Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
   [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
{!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
tag=t}-_query_:{!frange v=val_i l=1 u=1}]
   [junit4]at 
__randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
   [junit4]at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
{noformat}

The seed fails consistently for me on trunk using java7, and on 4x using both 
java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-07 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864911#comment-13864911
 ] 

Hoss Man commented on SOLR-5618:


Relevant log snipper from jenkins...

{noformat}
   [junit4]   2 558586 T3202 C2360 oasc.SolrCore.execute [collection1] 
webapp=null path=null 
params={q={!frange+v%3Dval_i+l%3D0+u%3D1+cost%3D139+tag%3Dt}fq={!frange+v%3Dval_i+l%3D0+u%3D1}fq={!+cost%3D92}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}fq={!frange+v%3Dval_i+l%3D0+u%3D1+cache%3Dtrue+tag%3Dt}fq={!+cache%3Dtrue+tag%3Dt}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}}
 hits=0 status=0 QTime=1 
   [junit4]   2 558586 T3202 oas.SolrTestCaseJ4.assertJQ ERROR query failed 
JSON validation. error=mismatch: '1'!='0' @ response/numFound
   [junit4]   2 expected =/response/numFound==1
   [junit4]   2 response = {
   [junit4]   2  responseHeader:{
   [junit4]   2status:0,
   [junit4]   2QTime:1},
   [junit4]   2  response:{numFound:0,start:0,docs:[]
   [junit4]   2  }}
   [junit4]   2
   [junit4]   2 request = 
q={!frange+v%3Dval_i+l%3D0+u%3D1+cost%3D139+tag%3Dt}fq={!frange+v%3Dval_i+l%3D0+u%3D1}fq={!+cost%3D92}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}fq={!frange+v%3Dval_i+l%3D0+u%3D1+cache%3Dtrue+tag%3Dt}fq={!+cache%3Dtrue+tag%3Dt}-_query_:{!frange+v%3Dval_i+l%3D1+u%3D1}
   [junit4]   2 558587 T3202 oasc.SolrException.log ERROR 
java.lang.RuntimeException: mismatch: '1'!='0' @ response/numFound
   [junit4]   2at 
org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:732)
   [junit4]   2at 
org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:679)
   [junit4]   2at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:316)
...
   [junit4]   2 558588 T3202 oass.TestFiltering.testRandomFiltering ERROR 
FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 
tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange 
v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! 
cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}]
   [junit4]   2 558588 T3202 oas.SolrTestCaseJ4.tearDown ###Ending 
testRandomFiltering
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
-Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
-Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
   [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
{!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
tag=t}-_query_:{!frange v=val_i l=1 u=1}]
   [junit4]at 
__randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
   [junit4]at 
org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
   [junit4]at java.lang.Thread.run(Thread.java:744)
{noformat}
{noformat}



 Reproducible failure from TestFiltering.testRandomFiltering
 ---

 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 uwe's jenkins found this in java8...
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
 -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
 -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
[junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
[junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
 v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
 {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
 tag=t}-_query_:{!frange v=val_i l=1 u=1}]
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
[junit4]  at 
 org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
 {noformat}
 The seed fails consistently for me on trunk using java7, and on 4x using both 
 java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: The Old Git Discussion

2014-01-07 Thread David Smiley (@MITRE.org)
+1, Mark.

Git isn't perfect; I sympathize with the annoyances pointed out by Rob et.
all.  But I think we would be better off for it -- a net win considering the
upsides.  In the end I'd love to track changes via branches (which includes
forks people make to add changes), not with attaching patch files to an
issue tracker.  The way we do things here sucks for collaboration and it's a
higher bar for people to get involved than it can and should be.

~ David


Mark Miller-3 wrote
 I don’t really buy the fad argument, but as I’ve said, I’m willing to wait
 a little longer for others to catch on. I try and follow the stats and
 reports and articles on this pretty closely.
 
 As I mentioned early in the thread, by all appearances, the shift from SVN
 to GIT looks much like the shift from CVS to SVN. This was not a fad
 change, nor is the next mass movement likely to be.
 
 Just like no one starts a project on CVS anymore, we are almost already to
 the point where new projects start exclusive on GIT - especially open
 source.
 
 I’m happy to sit back and watch the trend continue though. The number of
 GIT users in the committee and among the committers only grows every time
 the discussion comes up.
 
 If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad
 argument. But it just doesn’t jive in 2014.
 
 - Mark





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-07 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5618:
---

Attachment: SOLR-5618.patch

This smells like a caching related bug ... but i have no idea why/where.

The test does multiple iterations where in each iteration it builds an index of 
a random number of documents, each containing an incremented value for id and 
val_i -- the number of documents can range from 1 to 21, with the id and 
val_i fields starting at 0.  Then it generates a bunch of random requests 
consisting of random q and fq params.

This is what the failing request looks like...

{noformat}
q  = {!frange v=val_i l=0 u=1 cost=139 tag=t}
fq = {!frange v=val_i l=0 u=1}
fq = {! cost=92}-_query_:{!frange v=val_i l=1 u=1} 
fq = {!frange v=val_i l=0 u=1 cache=true tag=t}
fq = {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}
{noformat}

So basically: it will only ever match docs which have val_i==0 -- which given 
how the index is built means it should always match exactly 1 document: the 0th 
doc -- but in the failure message we can see that it doens't match any docs.

(FWIW: adding some debugging indicates that in the iteration where this fails, 
the index only has 2 documents in it -- doc#0 and doc#1)

In this patch i'm attaching, I hacked the test to explicitly attempt the above 
query in every iteration, regardless of the num docs in the index, immediately 
after building the index -- and that new assertion never fails.  but then after 
it passes, it continues on with the existing logic, to generating a bunch of 
random requests and executing them -- and when it randomly generates the same 
query as above (that already succeeded in matching 1 doc against the current 
index) that query then fails to match any docs.

which smells to me like some sort of filter caching glitch .. right?

 Reproducible failure from TestFiltering.testRandomFiltering
 ---

 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-5618.patch


 uwe's jenkins found this in java8...
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
 -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
 -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
[junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
[junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
 v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
 {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
 tag=t}-_query_:{!frange v=val_i l=1 u=1}]
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
[junit4]  at 
 org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
 {noformat}
 The seed fails consistently for me on trunk using java7, and on 4x using both 
 java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2553) Nested Field Collapsing

2014-01-07 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865012#comment-13865012
 ] 

Kranti Parisa commented on SOLR-2553:
-

I think we will also need to support other grouping params especially 
group.limit. so that user can restrict the results even with Nested Groups

 Nested Field Collapsing
 ---

 Key: SOLR-2553
 URL: https://issues.apache.org/jira/browse/SOLR-2553
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Martijn Laarman

 Currently specifying grouping on multiple fields returns multiple datasets. 
 It would be nice if Solr supported cascading / nested grouping by applying 
 the first group over the entire result set, the next over each group and so 
 forth and so forth. 
 Even if limited to supporting nesting grouping 2 levels deep would cover alot 
 of use cases. 
 group.field=locationgroup.field=type
 -Location X
 ---Type 1
 -documents
 ---Type 2
 documents
 -Location Y
 ---Type 1
 documents
 ---Type 2
 documents
 instead of 
 -Location X
 -- documents
 -Location Y
 --documents
 -Type 1
 --documents
 -Type2
 --documents



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5560) Enable LocalParams without escaping the query

2014-01-07 Thread Ryan Cutter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865053#comment-13865053
 ] 

Ryan Cutter commented on SOLR-5560:
---

I don't know, I assume a committer familiar with this area will take a look in 
the near future.  I see other unassigned tickets with patches attached so I'm 
sure there's a process.

 Enable LocalParams without escaping the query
 -

 Key: SOLR-5560
 URL: https://issues.apache.org/jira/browse/SOLR-5560
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.6
Reporter: Isaac Hebsh
 Fix For: 4.7, 4.6.1

 Attachments: SOLR-5560.patch


 This query should be a legit syntax:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)
 currently it isn't, because the LocalParams can be specified on a single term 
 only.
 [~billnbell] thinks it is a bug.
 From the mailing list:
 {quote}
 We want to set a LocalParam on a nested query. When quering with v inline 
 parameter, it works fine:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\}
 the parsedquery_toString is
 +id:TERM1 +(text:term2 text:term3 text:term4 term5)
 Query using the _query_ also works fine:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\
 (parsedquery is exactly the same).
 Obviously, there is the option of external parameter ({... 
 v=$nestedq}nestedq=...)
 This is a good solution, but it is not practical, when having a lot of such 
 nested queries.
 BUT, when trying to put the nested query in place, it yields syntax error:
 http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1
  AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5)
 org.apache.solr.search.SyntaxError: Cannot parse '(TERM2'
 The previous options are less preferred, because the escaping that should be 
 made on the nested query.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP

2014-01-07 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5610:
-

Description: 
Add a collection admin API for cluster wide property management
the new API would create an entry in the root as 
/cluster-props.json
{code:javascript}
{
prop:val
}
{code}

The API would work as

/command=clusterpropname=propNamevalue=propVal

there will be a set of well-known properties which can be set or unset with 
this command

  was:
Add a collection admin API for cluster wide property management
the new API would create an entry in the root as 
/cluster-props.json
{code:javascipt}
{
prop:val
}

The API would work as

/command=clusterpropname=propNamevalue=propVal

there will be a set of well-known properties which can be set or unset with 
this command


 Support cluster-wide properties with an API called CLUSTERPROP
 --

 Key: SOLR-5610
 URL: https://issues.apache.org/jira/browse/SOLR-5610
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul

 Add a collection admin API for cluster wide property management
 the new API would create an entry in the root as 
 /cluster-props.json
 {code:javascript}
 {
 prop:val
 }
 {code}
 The API would work as
 /command=clusterpropname=propNamevalue=propVal
 there will be a set of well-known properties which can be set or unset with 
 this command



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org