[jira] [Updated] (SOLR-5438) DebugComponent throws NPE when used with grouping

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5438:


Attachment: SOLR-5438.patch

Patch updated to trunk.

 DebugComponent throws NPE when used with grouping
 -

 Key: SOLR-5438
 URL: https://issues.apache.org/jira/browse/SOLR-5438
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5438.patch, SOLR-5438.patch


 To Reproduce: 
 start trunk example
 Run query:  
 http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr
 DebugComponent throws a NPE like: 
 {noformat}
 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Seems like some internal requests when using grouping don't populate 
 ResponseBuilder.results. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5438) DebugComponent throws NPE when used with grouping

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822252#comment-13822252
 ] 

ASF subversion and git services commented on SOLR-5438:
---

Commit 1541849 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1541849 ]

SOLR-5438: DebugComponent throws NPE when used with grouping

 DebugComponent throws NPE when used with grouping
 -

 Key: SOLR-5438
 URL: https://issues.apache.org/jira/browse/SOLR-5438
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5438.patch, SOLR-5438.patch


 To Reproduce: 
 start trunk example
 Run query:  
 http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr
 DebugComponent throws a NPE like: 
 {noformat}
 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Seems like some internal requests when using grouping don't populate 
 ResponseBuilder.results. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5438) DebugComponent throws NPE when used with grouping

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822256#comment-13822256
 ] 

ASF subversion and git services commented on SOLR-5438:
---

Commit 1541853 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1541853 ]

SOLR-5438: DebugComponent throws NPE when used with grouping

 DebugComponent throws NPE when used with grouping
 -

 Key: SOLR-5438
 URL: https://issues.apache.org/jira/browse/SOLR-5438
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5438.patch, SOLR-5438.patch


 To Reproduce: 
 start trunk example
 Run query:  
 http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr
 DebugComponent throws a NPE like: 
 {noformat}
 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Seems like some internal requests when using grouping don't populate 
 ResponseBuilder.results. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5438) DebugComponent throws NPE when used with grouping

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-5438.
-

   Resolution: Fixed
Fix Version/s: 4.7
   5.0

This is fixed. Thanks Tomás!

 DebugComponent throws NPE when used with grouping
 -

 Key: SOLR-5438
 URL: https://issues.apache.org/jira/browse/SOLR-5438
 Project: Solr
  Issue Type: Bug
Reporter: Tomás Fernández Löbbe
Assignee: Shalin Shekhar Mangar
 Fix For: 5.0, 4.7

 Attachments: SOLR-5438.patch, SOLR-5438.patch


 To Reproduce: 
 start trunk example
 Run query:  
 http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr
 DebugComponent throws a NPE like: 
 {noformat}
 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter  – 
 null:java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Seems like some internal requests when using grouping don't populate 
 ResponseBuilder.results. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822262#comment-13822262
 ] 

Shalin Shekhar Mangar commented on SOLR-5399:
-

Looks like this is causing a transaction log leak on Windows.

http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3386/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3463/

 Improve DebugComponent for distributed requests
 ---

 Key: SOLR-5399
 URL: https://issues.apache.org/jira/browse/SOLR-5399
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Ryan Ernst
 Fix For: 5.0, 4.7

 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch


 I'm working on extending the DebugComponent for adding some useful 
 information to be able to track distributed requests better. I'm adding two 
 different things, first, the request can generate a request ID that will be 
 printed in the logs for the main query and all the different internal 
 requests to the different shards. This should make it easier to find the 
 different parts of a single user request in the logs. It would also add the 
 purpose of each internal request to the logs, like: 
 RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. 
 Also, I'm adding a track section to the debug info where to add information 
 about the different phases of the distributed request (right now, I'm only 
 including QTime, but could eventually include more information) like: 
 {code:xml}
 lst name=debug
 lst name=track
 lst name=EXECUTE_QUERY
 str name=localhost:8985/solrQTime: 10/str
 str name=localhost:8984/solrQTime: 25/str
 /lst
 lst name=GET_FIELDS
 str name=localhost:8985/solrQTime: 1/str
 /lst
 /lst
 /lst
 {code}
 To get this, debugQuery must be set to true, or debug must include 
 debug=track. This information is only added to distributed requests.  I 
 would like to get feedback on this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5440) UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates

2013-11-14 Thread Chris (JIRA)
Chris created SOLR-5440:
---

 Summary: UAX29URLEmailTokenizer thread hangs on getNextToken - 
causes cloud to stop accepting updates
 Key: SOLR-5440
 URL: https://issues.apache.org/jira/browse/SOLR-5440
 Project: Solr
  Issue Type: Bug
Reporter: Chris


This is a pretty nasty bug, and causes the cluster to stop accepting updates. 
I'm not sure how to consistently reproduce it but I have done so numerous 
times. Switching to a whitespace tokenizer improved indexing speed, and I never 
got the issue again.

When the thread hits this bug it uses 100% CPU, restarting the node which has 
the error allows indexing to continue until hit again. Here is thread dump:



http-bio-8080-exec-45 (201)


org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken​(UAX29URLEmailTokenizerImpl.java:4343)

org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken​(UAX29URLEmailTokenizer.java:147)

org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken​(FilteringTokenFilter.java:82)

org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken​(LowerCaseFilter.java:54)

org.apache.lucene.index.DocInverterPerField.processFields​(DocInverterPerField.java:174)

org.apache.lucene.index.DocFieldProcessor.processDocument​(DocFieldProcessor.java:248)

org.apache.lucene.index.DocumentsWriterPerThread.updateDocument​(DocumentsWriterPerThread.java:253)

org.apache.lucene.index.DocumentsWriter.updateDocument​(DocumentsWriter.java:453)
org.apache.lucene.index.IndexWriter.updateDocument​(IndexWriter.java:1517)

org.apache.solr.update.DirectUpdateHandler2.addDoc​(DirectUpdateHandler2.java:217)

org.apache.solr.update.processor.RunUpdateProcessor.processAdd​(RunUpdateProcessorFactory.java:69)

org.apache.solr.update.processor.UpdateRequestProcessor.processAdd​(UpdateRequestProcessor.java:51)

org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd​(DistributedUpdateProcessor.java:583)

org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd​(DistributedUpdateProcessor.java:719)

org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd​(DistributedUpdateProcessor.java:449)

org.apache.solr.handler.loader.JavabinLoader$1.update​(JavabinLoader.java:89)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator​(JavaBinUpdateRequestCodec.java:151)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator​(JavaBinUpdateRequestCodec.java:131)
org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:221)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList​(JavaBinUpdateRequestCodec.java:116)
org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:186)
org.apache.solr.common.util.JavaBinCodec.unmarshal​(JavaBinCodec.java:112)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal​(JavaBinUpdateRequestCodec.java:158)

org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs​(JavabinLoader.java:99)
org.apache.solr.handler.loader.JavabinLoader.load​(JavabinLoader.java:58)

org.apache.solr.handler.UpdateRequestHandler$1.load​(UpdateRequestHandler.java:92)

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody​(ContentStreamHandlerBase.java:74)

org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:135)
org.apache.solr.core.SolrCore.execute​(SolrCore.java:1859)

org.apache.solr.servlet.SolrDispatchFilter.execute​(SolrDispatchFilter.java:703)

org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:406)

org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:195)

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter​(ApplicationFilterChain.java:243)

org.apache.catalina.core.ApplicationFilterChain.doFilter​(ApplicationFilterChain.java:210)

org.apache.catalina.core.StandardWrapperValve.invoke​(StandardWrapperValve.java:222)

org.apache.catalina.core.StandardContextValve.invoke​(StandardContextValve.java:123)

org.apache.catalina.core.StandardHostValve.invoke​(StandardHostValve.java:171)

org.apache.catalina.valves.ErrorReportValve.invoke​(ErrorReportValve.java:99)
org.apache.catalina.valves.AccessLogValve.invoke​(AccessLogValve.java:953)

org.apache.catalina.core.StandardEngineValve.invoke​(StandardEngineValve.java:118)
org.apache.catalina.connector.CoyoteAdapter.service​(CoyoteAdapter.java:408)

org.apache.coyote.http11.AbstractHttp11Processor.process​(AbstractHttp11Processor.java:1023)

org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process​(AbstractProtocol.java:589)

org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run​(JIoEndpoint.java:312)

Re: Build failed in Jenkins: lucene-solr-46-smoker #5

2013-11-14 Thread Simon Willnauer
yes that is what I was using. the odd part is that I rebuild the RC
and wiped my m2 repo for the old release and now it doesn't run OOM
anymore. Oddness at it's best. It also only happened on my mac but not
on the linux box. I don't know what's going on but it seems to be
sorted out.

simon

On Wed, Nov 13, 2013 at 9:42 PM, Steve Rowe sar...@gmail.com wrote:
 Simon,

 Are you using the maven-compiler-plugin's “maxmem” configuration?: 
 http://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#maxmem.
   I’ve never had to use this before, but you should also be able to try it on 
 the command line using -Dmaven.compiler.maxmem=512m (or whatever size is 
 required).

 Steve

 On Nov 13, 2013, at 3:05 PM, Simon Willnauer simon.willna...@gmail.com 
 wrote:

 hmm this is odd. I really don't get why I can't compile ES anymore
 then. I was running the compile with default values not sure how much
 it is but now I can't complie it anymore with 4.6.
 I am not sure what is going on here but it seems the are pretty much broken?

 I will build an RC and we can try it then again!


 On Wed, Nov 13, 2013 at 8:33 PM, Steve Rowe sar...@gmail.com wrote:
 Simon,

 As I mentioned in the 4.6 release thread, I backported LUCENE-5217 and 
 LUCENE-5322 to branch_4x *after* you made the 4.6 release branch, so the 
 LUCENE-5322-related things I’m committing to trunk and branch_4x should not 
 affect the 4.6 release branch.

 So the 4.6 POMs should be very much like the 4.5.1 POMs.

 5GB???  How much did you need for 4.5.1?

 Steve

 On Nov 13, 2013, at 2:24 PM, Simon Willnauer simon.willna...@gmail.com 
 wrote:

 I am just reading through the thread though. I had some problems with
 java 1.7 vs. 1.6 so smoketest failed but the release was still
 testable so I integarted it into elasticsearch. Yet with that artifact
 I build I can't even compile ES since the maven compiler target runs
 out of memory even if I give it 5GB which with 4.5.1 it just ran fine.
 So we have a problem here that I can't really figure out though. This
 maven thing seems to be a black box though. @sarowe I see you are
 fixing something related to maven but is this actually related?

 if yes you need to backport it to 4.x and the release branch please.

 simon

 On Wed, Nov 13, 2013 at 8:15 PM, Robert Muir rcm...@gmail.com wrote:
 Thanks Uwe... so JAVA_HOME seems to be working fine.

 This doesnt explain why simon had problems, but at least it confirms
 its working: maybe he had a configuration issue.

 On Wed, Nov 13, 2013 at 2:10 PM, Uwe Schindler u...@thetaphi.de wrote:
 Hi Robert,

 The Jenkins config was wrong. Your Jenkins instance runs with Java 7, so 
 by default it starts jobs also with Java 7. You only configured 
 JAVA_HOME as a sysprop for ANT, so it got passed with -DJAVA_HOME.

 The right config is to define available Java installations via the Admin 
 UI and select the right one (not Default) in the job config. After 
 that, Jenkins sets JAVA_HOME to the right one before launching ANT.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, November 13, 2013 7:00 PM
 To: dev@lucene.apache.org
 Subject: Re: Build failed in Jenkins: lucene-solr-46-smoker #5

 This looks like a real bug in the build?

 Ive got JAVA_HOME set to a 1.6 compiler...

 On Wed, Nov 13, 2013 at 12:56 PM, Charlie Cron
 hudsonsevilt...@gmail.com wrote:
 See http://sierranevada.servebeer.com/job/lucene-solr-46-smoker/5/

 --
 [...truncated 52012 lines...]
 jar-grouping:

 check-queries-uptodate:

 jar-queries:

 check-queryparser-uptodate:

 jar-queryparser:

 check-join-uptodate:

 jar-join:

 prep-lucene-jars:

 resolve-example:

 ivy-availability-check:
[echo] Building solr-solrj...

 ivy-fail:

 ivy-configure:
 [ivy:configure] :: loading settings :: file =
 http://sierranevada.servebeer.com/job/lucene-solr-46-
 smoker/ws/lucene
 /ivy-settings.xml

 resolve:

 common.init:

 compile-lucene-core:

 init:

 -clover.disable:

 -clover.load:

 -clover.classpath:

 -clover.setup:

 clover:

 common.compile-core:

 compile-core:

 resolve-groovy:

 define-lucene-javadoc-url:

 javadocs:
[echo] Building solr-solrj...

 download-java6-javadoc-packagelist:
  [delete] Deleting: http://sierranevada.servebeer.com/job/lucene-solr-
 46-smoker/ws/solr/build/docs/solr-solrj/stylesheet.css
 [javadoc] Generating Javadoc
 [javadoc] Javadoc execution
 [javadoc] Loading source files for package 
 org.apache.solr.client.solrj...
 [javadoc] warning: [options] bootstrap class path not set in 
 conjunction
 with -source 1.6
 [javadoc] Loading source files for package
 org.apache.solr.client.solrj.beans...
 [javadoc] Loading source files for package
 org.apache.solr.client.solrj.impl...
 [javadoc] Loading source files for package
 org.apache.solr.client.solrj.request...
 [javadoc] 

[jira] [Commented] (SOLR-5440) UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates

2013-11-14 Thread Chris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822268#comment-13822268
 ] 

Chris commented on SOLR-5440:
-

Googling I found someone hit the same issue with elasticsearch, 
https://gist.github.com/jeremy/2925923

 UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop 
 accepting updates
 

 Key: SOLR-5440
 URL: https://issues.apache.org/jira/browse/SOLR-5440
 Project: Solr
  Issue Type: Bug
Reporter: Chris

 This is a pretty nasty bug, and causes the cluster to stop accepting updates. 
 I'm not sure how to consistently reproduce it but I have done so numerous 
 times. Switching to a whitespace tokenizer improved indexing speed, and I 
 never got the issue again.
 When the thread hits this bug it uses 100% CPU, restarting the node which has 
 the error allows indexing to continue until hit again. Here is thread dump:
 http-bio-8080-exec-45 (201)
 
 org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken​(UAX29URLEmailTokenizerImpl.java:4343)
 
 org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken​(UAX29URLEmailTokenizer.java:147)
 
 org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken​(FilteringTokenFilter.java:82)
 
 org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken​(LowerCaseFilter.java:54)
 
 org.apache.lucene.index.DocInverterPerField.processFields​(DocInverterPerField.java:174)
 
 org.apache.lucene.index.DocFieldProcessor.processDocument​(DocFieldProcessor.java:248)
 
 org.apache.lucene.index.DocumentsWriterPerThread.updateDocument​(DocumentsWriterPerThread.java:253)
 
 org.apache.lucene.index.DocumentsWriter.updateDocument​(DocumentsWriter.java:453)
 org.apache.lucene.index.IndexWriter.updateDocument​(IndexWriter.java:1517)
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc​(DirectUpdateHandler2.java:217)
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd​(RunUpdateProcessorFactory.java:69)
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd​(UpdateRequestProcessor.java:51)
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd​(DistributedUpdateProcessor.java:583)
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd​(DistributedUpdateProcessor.java:719)
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd​(DistributedUpdateProcessor.java:449)
 
 org.apache.solr.handler.loader.JavabinLoader$1.update​(JavabinLoader.java:89)
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator​(JavaBinUpdateRequestCodec.java:151)
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator​(JavaBinUpdateRequestCodec.java:131)
 org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:221)
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList​(JavaBinUpdateRequestCodec.java:116)
 org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:186)
 org.apache.solr.common.util.JavaBinCodec.unmarshal​(JavaBinCodec.java:112)
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal​(JavaBinUpdateRequestCodec.java:158)
 
 org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs​(JavabinLoader.java:99)
 org.apache.solr.handler.loader.JavabinLoader.load​(JavabinLoader.java:58)
 
 org.apache.solr.handler.UpdateRequestHandler$1.load​(UpdateRequestHandler.java:92)
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody​(ContentStreamHandlerBase.java:74)
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:135)
 org.apache.solr.core.SolrCore.execute​(SolrCore.java:1859)
 
 org.apache.solr.servlet.SolrDispatchFilter.execute​(SolrDispatchFilter.java:703)
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:406)
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:195)
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter​(ApplicationFilterChain.java:243)
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter​(ApplicationFilterChain.java:210)
 
 org.apache.catalina.core.StandardWrapperValve.invoke​(StandardWrapperValve.java:222)
 
 org.apache.catalina.core.StandardContextValve.invoke​(StandardContextValve.java:123)
 
 org.apache.catalina.core.StandardHostValve.invoke​(StandardHostValve.java:171)
 
 org.apache.catalina.valves.ErrorReportValve.invoke​(ErrorReportValve.java:99)
 org.apache.catalina.valves.AccessLogValve.invoke​(AccessLogValve.java:953)
 
 

[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822273#comment-13822273
 ] 

Shai Erera commented on LUCENE-5339:


About FacetIndexWriter, it will need to take an optional TaxonomyWriter (i.e. 
if you intend to use TaxonomyFacetField). But then I wonder if users won't 
expect that FacetIW.commit() won't commit both the underlying IW and TW. 
Actually, this could be a very good thing to do, since we could control the 
order those two objects are committed, and do the two-phase commit properly, 
rather than telling users what to do. But that means we'd need to make 
IW.commit() not final. Besides the advantage of doing the commit right, I worry 
that if we don't do that, users will be confused about having to call 
TW.commit() themselves, just because now FacetIW already has a handle to their 
TW. What do you think? We could also just add a commitTaxoAndIndex() method, 
but that's less elegant.

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5440) UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates

2013-11-14 Thread Chris (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated SOLR-5440:


  Description: 
This is a pretty nasty bug, and causes the cluster to stop accepting updates. 
I'm not sure how to consistently reproduce it but I have done so numerous 
times. Switching to a whitespace tokenizer improved indexing speed, and I never 
got the issue again.

I'm running a 4.6 Snapshot - I had issues with deadlocks with numerous versions 
of Solr, and have finally narrowed down the problem to this code, which affects 
many/all(?) versions of Solr.

When the thread hits this issue it uses 100% CPU, restarting the node which has 
the error allows indexing to continue until hit again. Here is thread dump:



http-bio-8080-exec-45 (201)


org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken​(UAX29URLEmailTokenizerImpl.java:4343)

org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken​(UAX29URLEmailTokenizer.java:147)

org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken​(FilteringTokenFilter.java:82)

org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken​(LowerCaseFilter.java:54)

org.apache.lucene.index.DocInverterPerField.processFields​(DocInverterPerField.java:174)

org.apache.lucene.index.DocFieldProcessor.processDocument​(DocFieldProcessor.java:248)

org.apache.lucene.index.DocumentsWriterPerThread.updateDocument​(DocumentsWriterPerThread.java:253)

org.apache.lucene.index.DocumentsWriter.updateDocument​(DocumentsWriter.java:453)
org.apache.lucene.index.IndexWriter.updateDocument​(IndexWriter.java:1517)

org.apache.solr.update.DirectUpdateHandler2.addDoc​(DirectUpdateHandler2.java:217)

org.apache.solr.update.processor.RunUpdateProcessor.processAdd​(RunUpdateProcessorFactory.java:69)

org.apache.solr.update.processor.UpdateRequestProcessor.processAdd​(UpdateRequestProcessor.java:51)

org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd​(DistributedUpdateProcessor.java:583)

org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd​(DistributedUpdateProcessor.java:719)

org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd​(DistributedUpdateProcessor.java:449)

org.apache.solr.handler.loader.JavabinLoader$1.update​(JavabinLoader.java:89)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator​(JavaBinUpdateRequestCodec.java:151)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator​(JavaBinUpdateRequestCodec.java:131)
org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:221)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList​(JavaBinUpdateRequestCodec.java:116)
org.apache.solr.common.util.JavaBinCodec.readVal​(JavaBinCodec.java:186)
org.apache.solr.common.util.JavaBinCodec.unmarshal​(JavaBinCodec.java:112)

org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal​(JavaBinUpdateRequestCodec.java:158)

org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs​(JavabinLoader.java:99)
org.apache.solr.handler.loader.JavabinLoader.load​(JavabinLoader.java:58)

org.apache.solr.handler.UpdateRequestHandler$1.load​(UpdateRequestHandler.java:92)

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody​(ContentStreamHandlerBase.java:74)

org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:135)
org.apache.solr.core.SolrCore.execute​(SolrCore.java:1859)

org.apache.solr.servlet.SolrDispatchFilter.execute​(SolrDispatchFilter.java:703)

org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:406)

org.apache.solr.servlet.SolrDispatchFilter.doFilter​(SolrDispatchFilter.java:195)

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter​(ApplicationFilterChain.java:243)

org.apache.catalina.core.ApplicationFilterChain.doFilter​(ApplicationFilterChain.java:210)

org.apache.catalina.core.StandardWrapperValve.invoke​(StandardWrapperValve.java:222)

org.apache.catalina.core.StandardContextValve.invoke​(StandardContextValve.java:123)

org.apache.catalina.core.StandardHostValve.invoke​(StandardHostValve.java:171)

org.apache.catalina.valves.ErrorReportValve.invoke​(ErrorReportValve.java:99)
org.apache.catalina.valves.AccessLogValve.invoke​(AccessLogValve.java:953)

org.apache.catalina.core.StandardEngineValve.invoke​(StandardEngineValve.java:118)
org.apache.catalina.connector.CoyoteAdapter.service​(CoyoteAdapter.java:408)

org.apache.coyote.http11.AbstractHttp11Processor.process​(AbstractHttp11Processor.java:1023)

org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process​(AbstractProtocol.java:589)


[jira] [Updated] (LUCENE-5329) Make DocumentDictionary and co more lenient to dirty documents

2013-11-14 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5329:
-

Attachment: LUCENE-5329.patch

Patch Updated:
  - Added ctor for DocumentExpressionDictionary (can take in ValueSource) 
[wondering if the name should be more general, as it can now compute weights 
using ValueSource directly]
  - Allow DocumentDictionary to use NumericDocValuesField for suggestion weights
  - Updated tests to reflect new changes

NOTE: using ant documenation-lint gives me this error (any advice on fixing 
this javadoc is greatly appreciated):
 [exec] 
file:///build/docs/suggest/org/apache/lucene/search/suggest/DocumentExpressionDictionary.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html
 [exec]   BROKEN LINK: 
file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html

 Make DocumentDictionary and co more lenient to dirty documents
 --

 Key: LUCENE-5329
 URL: https://issues.apache.org/jira/browse/LUCENE-5329
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5329.patch, LUCENE-5329.patch


 Currently DocumentDictionary errors out whenever any document does not have 
 value for any relevant stored fields. It would be nice to make it lenient and 
 instead ignore the invalid documents.
 Another issue with the DocumentDictionary is that it only allows string 
 fields as suggestions and binary fields as payloads. When exposing these 
 dictionaries to solr (via https://issues.apache.org/jira/browse/SOLR-5378), 
 it is inconvenient for the user to ensure that a suggestion field is a string 
 field and a payload field is a binary field. It would be nice to have the 
 dictionary just work whenever a string/binary field is passed to 
 suggestion/payload field. The patch provides one solution to this problem (by 
 accepting string or binary values), though it would be great if there are any 
 other solution to this, without making the DocumentDictionary too flexible



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Simon Willnauer
Please vote for the first Release Candidate for Lucene/Solr 4.6.0

you can download it here:
http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686

or run the smoke tester directly with this commandline (don't forget
to set JAVA6_HOME etc.):

python3.2 -u dev-tools/scripts/smokeTestRelease.py
http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
1541686 4.6.0 /tmp/smoke_test_4_6


I integrated the RC into Elasticsearch and all tests pass:

https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de

Smoketester said: SUCCESS! [1:15:57.339272]

here is my +1


Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2724) Deprecate defaultSearchField and defaultOperator defined in schema.xml

2013-11-14 Thread Anca Kopetz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822304#comment-13822304
 ] 

Anca Kopetz commented on SOLR-2724:
---

Hi,

I do not know if my comment is useful, but I found some code that still uses 
the defaultOperator from schema.xml in solr-core-4.5.1

{code:title=DisMaxQParser.java|borderStyle=solid}
  public static String parseMinShouldMatch(final IndexSchema schema, 
   final SolrParams params) {
org.apache.solr.parser.QueryParser.Operator op = 
QueryParsing.getQueryParserDefaultOperator
(schema, params.get(QueryParsing.OP));
return params.get(DisMaxParams.MM, op.equals(QueryParser.Operator.AND) ? 
100% : 0%);
  }
{code}

{code:title=QueryParsing.java|borderStyle=solid}
public static QueryParser.Operator getQueryParserDefaultOperator(final 
IndexSchema sch,
   final String override) {
String val = override;
if (null == val) val = sch.getQueryParserDefaultOperator();
return AND.equals(val) ? QueryParser.Operator.AND : 
QueryParser.Operator.OR;
  }
{code}

 Deprecate defaultSearchField and defaultOperator defined in schema.xml
 --

 Key: SOLR-2724
 URL: https://issues.apache.org/jira/browse/SOLR-2724
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, search
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 3.6, 4.0-ALPHA

 Attachments: 
 SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch, 
 SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 I've always been surprised to see the defaultSearchField element and 
 solrQueryParser defaultOperator=OR/ defined in the schema.xml file since 
 the first time I saw them.  They just seem out of place to me since they are 
 more query parser related than schema related. But not only are they 
 misplaced, I feel they shouldn't exist. For query parsers, we already have a 
 df parameter that works just fine, and explicit field references. And the 
 default lucene query operator should stay at OR -- if a particular query 
 wants different behavior then use q.op or simply use OR.
 similarity Seems like something better placed in solrconfig.xml than in the 
 schema. 
 In my opinion, defaultSearchField and defaultOperator configuration elements 
 should be deprecated in Solr 3.x and removed in Solr 4.  And similarity 
 should move to solrconfig.xml. I am willing to do it, provided there is 
 consensus on it of course.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_45) - Build # 3464 - Still Failing!

2013-11-14 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3464/
Java: 64bit/jdk1.7.0_45 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.component.DistributedDebugComponentTest

Error Message:
Unable to delete file: 
.\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000

Stack Trace:
java.io.IOException: Unable to delete file: 
.\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000
at __randomizedtesting.SeedInfo.seed([3BB258D4BFF7C45F]:0)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1919)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at 
org.apache.solr.SolrJettyTestBase.cleanUpJettyHome(SolrJettyTestBase.java:189)
at 
org.apache.solr.handler.component.DistributedDebugComponentTest.afterTest(DistributedDebugComponentTest.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:700)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:744)




Build Log:
[...truncated 10822 lines...]
   [junit4] Suite: 
org.apache.solr.handler.component.DistributedDebugComponentTest
   [junit4]   2 3321829 T9083 oas.SolrTestCaseJ4.initCore initCore
   [junit4]   2 Creating dataDir: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\.\solrtest-DistributedDebugComponentTest-1384429229063
   [junit4]   2 3321833 T9083 oas.SolrTestCaseJ4.initCore initCore end
   [junit4]   2 3321833 T9083 oas.SolrJettyTestBase.getSSLConfig Randomized 
ssl (false) and clientAuth (false)
   [junit4]   2 3321833 T9083 oejs.Server.doStart jetty-8.1.10.v20130312
   [junit4]   2 3321843 T9083 oejs.AbstractConnector.doStart Started 
SelectChannelConnector@127.0.0.1:55199
   [junit4]   2 3321843 T9083 oass.SolrDispatchFilter.init 
SolrDispatchFilter.init()
   [junit4]   2 3321844 T9083 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured 

[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.8.0-ea-b114) - Build # 3387 - Still Failing!

2013-11-14 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3387/
Java: 32bit/jdk1.8.0-ea-b114 -client -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test

Error Message:
unreferenced files: before delete: [_2_Pulsing41_0.doc, _2_Pulsing41_0.pos, 
_5.cfe, _5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, 
_r.cfs, _r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, 
_u.fnm, _u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, 
_u_Lucene41WithOrds_0.pos, _u_Lucene41WithOrds_0.tib, 
_u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, _u_MockVariableIntBlock_0.doc, 
_u_MockVariableIntBlock_0.frq, _u_MockVariableIntBlock_0.pos, 
_u_MockVariableIntBlock_0.pyl, _u_MockVariableIntBlock_0.skp, 
_u_MockVariableIntBlock_0.tib, _u_MockVariableIntBlock_0.tii, 
_u_Pulsing41_0.doc, _u_Pulsing41_0.pos, _u_Pulsing41_0.tim, _u_Pulsing41_0.tip, 
_u_SimpleText_0.dat, segments.gen, segments_o]   after delete: [_5.cfe, 
_5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, _r.cfs, 
_r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, _u.fnm, 
_u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, 
_u_Lucene41WithOrds_0.pos, _u_Lucene41WithOrds_0.tib, 
_u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, _u_MockVariableIntBlock_0.doc, 
_u_MockVariableIntBlock_0.frq, _u_MockVariableIntBlock_0.pos, 
_u_MockVariableIntBlock_0.pyl, _u_MockVariableIntBlock_0.skp, 
_u_MockVariableIntBlock_0.tib, _u_MockVariableIntBlock_0.tii, 
_u_Pulsing41_0.doc, _u_Pulsing41_0.pos, _u_Pulsing41_0.tim, _u_Pulsing41_0.tip, 
_u_SimpleText_0.dat, segments.gen, segments_o]  These files were removed: 
[_2_Pulsing41_0.doc, _2_Pulsing41_0.pos]  These files we had previously tried 
to delete, but couldn't: [_0_MockVariableIntBlock_0.pyl, 
_2_MockVariableIntBlock_0.frq, _2.fdt, _0.nvd, _0_Pulsing41_0.doc, 
_2_SimpleText_0.dat, _0_MockVariableIntBlock_0.doc, _2_Lucene41WithOrds_0.tib, 
_1.cfs, _2.nvd, _2_MockVariableIntBlock_0.skp, _2.tvd, _0_Pulsing41_0.pos, 
_2_Lucene41WithOrds_0.pos, _2_MockVariableIntBlock_0.pos, 
_2_MockVariableIntBlock_0.tib, _0_SimpleText_0.dat, _0_Lucene41WithOrds_0.pos, 
_0_MockVariableIntBlock_0.frq, _2_MockVariableIntBlock_0.pyl, 
_0_Lucene41WithOrds_0.tib, _0_Pulsing41_0.tim, _0.fdt, 
_0_MockVariableIntBlock_0.skp, _0.tvd, _2_MockVariableIntBlock_0.doc, 
_0_MockVariableIntBlock_0.pos, _2_Lucene41WithOrds_0.doc, _2_Pulsing41_0.tim, 
_0_Lucene41WithOrds_0.doc, _0_MockVariableIntBlock_0.tib]

Stack Trace:
java.lang.AssertionError: unreferenced files: before delete:
[_2_Pulsing41_0.doc, _2_Pulsing41_0.pos, _5.cfe, _5.cfs, _5.si, _i.cfe, 
_i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, _r.cfs, _r.si, _s.cfe, _s.cfs, 
_s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, _u.fnm, _u.nvd, _u.nvm, _u.si, 
_u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, _u_Lucene41WithOrds_0.pos, 
_u_Lucene41WithOrds_0.tib, _u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, 
_u_MockVariableIntBlock_0.doc, _u_MockVariableIntBlock_0.frq, 
_u_MockVariableIntBlock_0.pos, _u_MockVariableIntBlock_0.pyl, 
_u_MockVariableIntBlock_0.skp, _u_MockVariableIntBlock_0.tib, 
_u_MockVariableIntBlock_0.tii, _u_Pulsing41_0.doc, _u_Pulsing41_0.pos, 
_u_Pulsing41_0.tim, _u_Pulsing41_0.tip, _u_SimpleText_0.dat, segments.gen, 
segments_o]
  after delete:
[_5.cfe, _5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, 
_r.cfe, _r.cfs, _r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, 
_u.fdx, _u.fnm, _u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, 
_u_Lucene41WithOrds_0.doc, _u_Lucene41WithOrds_0.pos, 
_u_Lucene41WithOrds_0.tib, _u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, 
_u_MockVariableIntBlock_0.doc, _u_MockVariableIntBlock_0.frq, 
_u_MockVariableIntBlock_0.pos, _u_MockVariableIntBlock_0.pyl, 
_u_MockVariableIntBlock_0.skp, _u_MockVariableIntBlock_0.tib, 
_u_MockVariableIntBlock_0.tii, _u_Pulsing41_0.doc, _u_Pulsing41_0.pos, 
_u_Pulsing41_0.tim, _u_Pulsing41_0.tip, _u_SimpleText_0.dat, segments.gen, 
segments_o]

These files were removed: [_2_Pulsing41_0.doc, _2_Pulsing41_0.pos]

These files we had previously tried to delete, but couldn't: 
[_0_MockVariableIntBlock_0.pyl, _2_MockVariableIntBlock_0.frq, _2.fdt, _0.nvd, 
_0_Pulsing41_0.doc, _2_SimpleText_0.dat, _0_MockVariableIntBlock_0.doc, 
_2_Lucene41WithOrds_0.tib, _1.cfs, _2.nvd, _2_MockVariableIntBlock_0.skp, 
_2.tvd, _0_Pulsing41_0.pos, _2_Lucene41WithOrds_0.pos, 
_2_MockVariableIntBlock_0.pos, _2_MockVariableIntBlock_0.tib, 
_0_SimpleText_0.dat, _0_Lucene41WithOrds_0.pos, _0_MockVariableIntBlock_0.frq, 
_2_MockVariableIntBlock_0.pyl, _0_Lucene41WithOrds_0.tib, _0_Pulsing41_0.tim, 
_0.fdt, _0_MockVariableIntBlock_0.skp, _0.tvd, _2_MockVariableIntBlock_0.doc, 
_0_MockVariableIntBlock_0.pos, _2_Lucene41WithOrds_0.doc, _2_Pulsing41_0.tim, 
_0_Lucene41WithOrds_0.doc, _0_MockVariableIntBlock_0.tib]
at 
__randomizedtesting.SeedInfo.seed([28147EEFD1229E3:8AD5783453EE441B]:0)

[jira] [Updated] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5339:
---

Attachment: LUCENE-5339.patch

Thanks for the feedback everyone ... I'm attaching a new patch.

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822361#comment-13822361
 ] 

Michael McCandless commented on LUCENE-5339:


{quote}
Facet's Accumulator is similar to Lucene's Collector, the Aggregator is sort of 
a Scorer, and a FacetRequest is a sort of Query.
Actually the model after which the facets were designed was Lucene's.
The optional IndexingParams came before the IndexWriterConfig but these can be 
said to be similar as well.
{quote}

I appreciate those analogies but I think the two cases are very
different: I think faceting is (ought to be) far simpler than
searching.

bq. More low-level objects such as the CategoryListParams are not a must, and 
the user may never know about them (and btw, they are similar to Codecs).

Likewise, I don't think we need to expose codec like control /
pluggability over facet ords encoding at this point.

bq. I reviewed the patch (mostly the taxonomy related part) and I think that 
even without associations, counts only is a bit narrow.

I added ValueSource aggregation in the next patch, but not
associations; I think associations can come later (it's just another
index time and search time impl).

{quote}
Specially with large counts (say many thousands) the count doesn't say much 
because of the long tail problem.
When there's a large result set, all the categories will get high hit counts. 
And just as scoring by counting the number of query terms each document matches 
doesn't always make much sense (and I think all scoring functions do things a 
lot smarter), using counts for facets may at times yield irrelevant results.

We found out that for large result sets, an aggregation of Lucene's score 
(rather than +1), or even score^2 yields better results for the user. Also 
arbitrary expressions which are corpus specific (with or without associations) 
changes the facets' usability dramatically. That's partially why the code was 
built to allow different aggregation techniques, allowing associations, 
numeric values etc into each value for each category.
{quote}

I agree.

Do you think ValueSource faceting is sufficient for such apps?  Or do
they typically use associations?  Aren't associations only really
required in the multi-valued facet field case?

bq. As for the new API, it may be useful if there would be a single interface 
- so all facets implementations could be switched easily, allowing users to 
experiment with the different implementations without writing a lot of code.

Yeah I think so too ... it's on the TODO list.  Especially, if the
FacetsConfig knows the facet method used by a given field, then we
could (almost) produce the right impl at search time.


 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822362#comment-13822362
 ] 

Michael McCandless commented on LUCENE-5339:


bq. Can we rename CategoryPath to FacetLabel or something more intuitive?

+1 for FacetLabel; I put a nocommit.

But, the new patch actually nearly eliminates the need to create
CategoryPath (it's still needed to create a DrillDownQuery but I
dropped a nocommit to see if we can fix that).

bq. new LongRange(less than 10, 0L, true, 10L, false) -- can we make it so 
this is less arguments?

Not sure exactly how :)

bq. What if it worked like this:

This is an awesome idea!  I did that in the new patch; now indexing is
really simple, e.g.:

{code}
doc = new Document();
doc.add(new FacetField(Author, Frank));
doc.add(new FacetField(Publish Date, 1999, 5, 5));
{code}

and:

{code}
doc = new Document();
doc.add(new SortedSetDocValuesFacetField(a, bar));
{code}


 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822360#comment-13822360
 ] 

Michael McCandless commented on LUCENE-5339:


bq. You have a TODO file which seems to have been 'svn added' - are you aware 
of it? 

I put a nocommit.

bq. Maybe we should do this work in a branch and avoid the .simple package? 

I'll start a branch, but on the branch I'd like to keep working on
.simple for now while we bang out the API changes.  If things start to
crystallize then we can start cutting everything else over?

bq. Can FT.FieldType provide a ctor taking these arguments and then 
DEFAULT_FIELD_TYPE pass (false,false)? Or, init those two members to false.
bq. Maybe instead of setHierarchical + setMultiValued you do 
setDimType(DimType)? Then you could avoid the synchronization?

I put a nocommit.

bq. I wonder if FieldTypes won't confuse users w/ Field.FieldType, so maybe you 
should name it DimType or something? And the upper class FacetsConfig?

Good, I renamed both.

bq. Not sure, but I think that SimpleFacetFields adds the dimension's ordinal 
if the dim is both hierarchical and multi-valued? That's a step back from the 
default ALL_BUT_DIM that we have today. I think we might want to have a 
requiresDimValue or something, because I do think the dimension's value (count) 
is most often unneeded, and it's a waste to encode its ordinal?

It does, and I think that's OK?  Yes, it's one extra ord it indexes,
but that will be a minor perf hit since it's a multi-valued and
hierarchical.

bq. Constants isn't documented, not sure if 'precommit' will like that, but in 
general I think the constants should have jdocs. Maybe put a nocommit?

I put nocommits.  Sadly, precommit won't fail (lucene/facet is
missing in document-lint target in build.xml, because we never fixed
all its javadocs).  We should separately fix that.

bq. TaxonomyFacetCounts.getAllDims() – If a dimension is not hierarchical, I 
think SimpleFacetResult.count == 0? In that case, sorting by its count is 
useless?  I think it's also relevant for SortedSet?

I THINK these will work correctly (we sum the dim value as we visit
all children), but I was missing the corrected dim count in the
hierarchical + MV case so I added that.  Also, I put nocommits to test
this.

bq. LabelAndValue has a Number as the value rather than double. I see this 
class is used only in the final step (labeling), but what's wrong w/ the 
previous 'double' primitive? Is it to avoid casting or something?

I think it's crazy when I ask for counts that I get a double back :)
That's why I switched it to a Number.

bq. CategoryListIterator

I appreciate the usefulness of this API, but rather that adding in
into simple from the get-go, I'd like to build out the different
facet methods and understand if it's actually useful / worth the
additional abstraction.

For example, I'm not sure it would work very well with SSDV, since we
first count in seg-ord space an then convert to global-ord space only
when combining counts across segments (this gave better performance).
I mean, yes, it would work in that the abstraction would be correct,
but we'd be paying a performance penalty.

bq. E.g. that's how we were able to test fast cutting over facets to DV from 
Payload, that's a nice abstraction for letting you load the values from a 
cache, and so forth.

I think doing such future tests with the simple APIs will still be
easy; I don't think we should open up abstractions for the
encoding/decoding today.

bq. FacetsAggregator

I added another facet method, TaxonomyFacetSumValueSource for value
source aggregation, in the next patch, to explore this need...

bq. But I don't see this convenience getting away (you'll just pass a 
ListFacetsAggregator and then pull the requested values later on). 

True but ... we need some base class / interface that all these
*Facets.java implement ... I haven't done that yet (it's a TODO).

bq. What I like less about it is that it folds in the logic coded by 
FacetsAccumulator and FacetResultsHandler

Maybe we can move some of these methods into the base class?  I'm not
sure though... since for the TaxonomyFacetSumValueSource it's float[]
values and for the *Count it's int[] counts.

At the end of the day, the code that does the facet counting, the
rollup, pulling the topK, is in fact a small amount of code; I think
replicating bits of this code for the different methods is the lesser
evil than adding so many abstractions that the module is hard to
approach by new devs/users.

bq. Has variants that return a full sub-tree (TopKInEachNodeHandler)

That handler is such an exotic use case ... and the app can just
recurse itself, calling TaxoFacetCounts.getTopChildren?

bq. FacetArrays

We avoid the need to factor this out, by simply enumerating the facet
impls directly.  E.g. we (or a user) can make a TaxoFacetAvgScore that
allocates 

[jira] [Updated] (SOLR-5408) CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used

2013-11-14 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5408:
-

Summary: CollapsingQParserPlugin scores incorrectly when multiple sort 
criteria are used  (was: Collapsing Query Parser does not respect multiple Sort 
fields)

 CollapsingQParserPlugin scores incorrectly when multiple sort criteria are 
 used
 ---

 Key: SOLR-5408
 URL: https://issues.apache.org/jira/browse/SOLR-5408
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5
Reporter: Brandon Chapman
Assignee: Joel Bernstein
Priority: Critical
 Attachments: CollapsingQParserPlugin.java, 
 CollapsingQParserPlugin.java, SOLR-5027.patch, SOLR-5408.2.patch, 
 SOLR-5408.patch, SOLR-5408.patch


 When using the collapsing query parser, only the last sort field appears to 
 be used.
 http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20descqf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_engpf2=name_eng^2defType=edismaxrows=12pf=name_eng~5^3start=0q=ipadboost=sqrt(popularity)qt=/select_engfq=productType:MERCHANDISEfq=merchant:bestbuycanadafq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)fq=translations:engfl=psid,name_eng,scoredebug=truedebugQuery=truefq={!collapse+field%3DgroupId+nullPolicy%3Dexpand}
 result name=response numFound=5927 start=0 maxScore=5.6674457
 doc
 str name=psid3002010250210/str
 str name=name_eng
 ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD)
 /str
 float name=score0.41423172/float
 /doc
 The same query without using the collapsing query parser produces the 
 expected result.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5408) CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used

2013-11-14 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822374#comment-13822374
 ] 

Joel Bernstein commented on SOLR-5408:
--

Brandon,

That's good news. This fix has been committed to trunk and 4x. Thanks for 
reporting.

Joel

 CollapsingQParserPlugin scores incorrectly when multiple sort criteria are 
 used
 ---

 Key: SOLR-5408
 URL: https://issues.apache.org/jira/browse/SOLR-5408
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5
Reporter: Brandon Chapman
Assignee: Joel Bernstein
Priority: Critical
 Attachments: CollapsingQParserPlugin.java, 
 CollapsingQParserPlugin.java, SOLR-5027.patch, SOLR-5408.2.patch, 
 SOLR-5408.patch, SOLR-5408.patch


 When using the collapsing query parser, only the last sort field appears to 
 be used.
 http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20descqf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_engpf2=name_eng^2defType=edismaxrows=12pf=name_eng~5^3start=0q=ipadboost=sqrt(popularity)qt=/select_engfq=productType:MERCHANDISEfq=merchant:bestbuycanadafq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)fq=translations:engfl=psid,name_eng,scoredebug=truedebugQuery=truefq={!collapse+field%3DgroupId+nullPolicy%3Dexpand}
 result name=response numFound=5927 start=0 maxScore=5.6674457
 doc
 str name=psid3002010250210/str
 str name=name_eng
 ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD)
 /str
 float name=score0.41423172/float
 /doc
 The same query without using the collapsing query parser produces the 
 expected result.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



4.6 ReleaseNotes

2013-11-14 Thread Simon Willnauer
I started the Lucene Release Notes for 4.6 here:

https://wiki.apache.org/lucene-java/ReleaseNote46

Feel free to add / optimize. I'd appreciate if somebody from Solr Land
could start the Solr Release notes, I am not on top of the changes
there!

Thanks,

Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5052) bitset codec for off heap filters

2013-11-14 Thread Yuriy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822389#comment-13822389
 ] 

Yuriy commented on LUCENE-5052:
---

This is very simple implementation of codec which stores PostingLists as 
BitSets. This implementation passes BasePostingsFormatTestCase.testDocsOnly() 
test. 
Also I have found difficult to implement term dictionary and feel like it's 
better to somehow combine this posting format with any of standard ones.

 bitset codec for off heap filters
 -

 Key: LUCENE-5052
 URL: https://issues.apache.org/jira/browse/LUCENE-5052
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/codecs
Reporter: Mikhail Khludnev
  Labels: features
 Fix For: 5.0


 Colleagues,
 When we filter we don’t care any of scoring factors i.e. norms, positions, 
 tf, but it should be fast. The obvious way to handle this is to decode 
 postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). 
 Both of consuming a heap and decoding as well are expensive. 
 Let’s write a posting list as a bitset, if df is greater than segment's 
 maxdocs/8  (what about skiplists? and overall performance?). 
 Beside of the codec implementation, the trickiest part to me is to design API 
 for this. How we can let the app know that a term query don’t need to be 
 cached in heap, but can be held as an mmaped bitset?
 WDYT?  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5052) bitset codec for off heap filters

2013-11-14 Thread Yuriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuriy updated LUCENE-5052:
--

Attachment: bitsetcodec.zip

see
https://issues.apache.org/jira/browse/LUCENE-5052?focusedCommentId=13822389page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13822389

 bitset codec for off heap filters
 -

 Key: LUCENE-5052
 URL: https://issues.apache.org/jira/browse/LUCENE-5052
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/codecs
Reporter: Mikhail Khludnev
  Labels: features
 Fix For: 5.0

 Attachments: bitsetcodec.zip


 Colleagues,
 When we filter we don’t care any of scoring factors i.e. norms, positions, 
 tf, but it should be fast. The obvious way to handle this is to decode 
 postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). 
 Both of consuming a heap and decoding as well are expensive. 
 Let’s write a posting list as a bitset, if df is greater than segment's 
 maxdocs/8  (what about skiplists? and overall performance?). 
 Beside of the codec implementation, the trickiest part to me is to design API 
 for this. How we can let the app know that a term query don’t need to be 
 cached in heap, but can be held as an mmaped bitset?
 WDYT?  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (LUCENE-5052) bitset codec for off heap filters

2013-11-14 Thread Yuriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuriy updated LUCENE-5052:
--

Comment: was deleted

(was: see
https://issues.apache.org/jira/browse/LUCENE-5052?focusedCommentId=13822389page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13822389)

 bitset codec for off heap filters
 -

 Key: LUCENE-5052
 URL: https://issues.apache.org/jira/browse/LUCENE-5052
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/codecs
Reporter: Mikhail Khludnev
  Labels: features
 Fix For: 5.0

 Attachments: bitsetcodec.zip


 Colleagues,
 When we filter we don’t care any of scoring factors i.e. norms, positions, 
 tf, but it should be fast. The obvious way to handle this is to decode 
 postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). 
 Both of consuming a heap and decoding as well are expensive. 
 Let’s write a posting list as a bitset, if df is greater than segment's 
 maxdocs/8  (what about skiplists? and overall performance?). 
 Beside of the codec implementation, the trickiest part to me is to design API 
 for this. How we can let the app know that a term query don’t need to be 
 cached in heap, but can be held as an mmaped bitset?
 WDYT?  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: DocValue on Strings slow and OOM

2013-11-14 Thread Per Steffensen
If anyone if following this one, just an update. We are not going to 
upgrade to 4.5.1 in order to see if the String facet performance problem 
has been fixed. Instead we have made a few hacks around our data so that 
we can store the c-field (c_dstr_doc_sto) as long instead 
(c_dlng_doc_sto). So now we only need to struggle with long-facet 
performance. There is a performance issue with facets on longs though, 
but I will tell about in another mailing-thread - need your input on 
what solution you prefer.


Regards, Per Steffensen

On 11/6/13 12:15 PM, Per Steffensen wrote:

On 11/6/13 11:43 AM, Robert Muir wrote:

Before lucene 4.5 docvalues were loaded entirely into RAM.

I'm not going to waste time debugging any old code releases here, you
should upgrade to the latest release!

Ok, thanks!

I do not consider it a bug (just a performance issue), so no debugging 
needed.
It is just that we do not want to spend time upgrading to 4.5 if there 
is not a justified hope/explanation that it will probably make things 
better. But I guess there is.


One short question: Will 4.5 index things differently (compared to 
4.4) for documents with fields like I showed earlier? Im basically 
asking if we need to reindex the 12billion documents again after 
upgrading to 4.5, or if we ought to be able to deploy 4.5 on top of 
the already indexed documents.


Regards, Per Steffensen




[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822400#comment-13822400
 ] 

Shalin Shekhar Mangar commented on SOLR-5428:
-

Thanks for the patch Elran. Collecting the 'distinctValues' is a very expensive 
operation. There should be a way to stop the collection of these two statistics.

Have you seen the LukeRequestHandler? Using the fl and maxTerms params I think 
you can get the same information.
http://wiki.apache.org/solr/LukeRequestHandler

 new statistics results to StatsComponent - distinctValues and countDistinct
 ---

 Key: SOLR-5428
 URL: https://issues.apache.org/jira/browse/SOLR-5428
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5428.patch


 I thought it would be very useful to display the distinct values (and the 
 count) of a field among other statistics. Attached a patch implementing this 
 in StatsComponent.
 Added results  :
 distinctValues - list of all distnict values
 countDistinct -  distnict values count.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Shai Erera
Smoke tester passes for me. +1!

Shai


On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer simon.willna...@gmail.com
 wrote:

 Please vote for the first Release Candidate for Lucene/Solr 4.6.0

 you can download it here:

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686

 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):

 python3.2 -u dev-tools/scripts/smokeTestRelease.py

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6


 I integrated the RC into Elasticsearch and all tests pass:


 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de

 Smoketester said: SUCCESS! [1:15:57.339272]

 here is my +1


 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822450#comment-13822450
 ] 

Shai Erera commented on LUCENE-5339:


About the patch:

* I think FacetField should an optional ctor taking the indexedFacetField, 
defaulting to $facets, then the ctor calls super() with the right field, and 
not dummy? And you can remove set/get?

* SimpleFacetsCollector jdocs are wrong -- there's no create?

* Do we still need SSDVFacetFields?

* I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me 
think if down the road we won't be sorry about doing this change (i.e. if 
anything changes on IW APIs). The good thing about FacetFields is that it just 
adds fields to a Document, and doesn't worry about IW API at all...

* DimType == DimConfig :). Sorry if I wasn't clear somewhere in my long 
response.

{quote}
That handler is such an exotic use case ... and the app can just
recurse itself, calling TaxoFacetCounts.getTopChildren?
{quote}

Could be, maybe it will work. It definitely allows asking for different topK 
for each child (something we currently don't support).

{quote}
It does, and I think that's OK? Yes, it's one extra ord it indexes,
but that will be a minor perf hit since it's a multi-valued and
hierarchical.
{quote}

I don't know. Like if your facet has 2 levels, that's 33% more ords. I think 
the count of the root ord is most likely never needed? And if it's needed, app 
can compute it by traversing its children and their values in the facet arrays? 
Maybe as a default we just never index it, and don't add a vague 
requiresDimCount/Value/Weight boolean?

{quote}
I think replicating bits of this code for the different methods is the lesser
evil than adding so many abstractions that the module is hard to
approach by new devs/users.
{quote}

Did we ever get such feedback from users? That the module is unapproachable?
I get the opposite feedback - that we don't have many abstractions! :)

{quote}
At the end of the day, the code that does the facet counting, the
rollup, pulling the topK, is in fact a small amount of code;
{quote}

You have a nocommit maybe we should do this lazily in regards for when to 
rollupValues. That shows me that now every developer who extends this API 
(let's be clear - users are oblivious to this happening) will facet the same 
decision (nocommit). If we discover one day that it's better to rollup lazily 
or not, other developers don't benefit from that decision. That's why I think 
some abstractions are good.

{quote}
I added ValueSource aggregation in the next patch, but not
associations; I think associations can come later (it's just another
index time and search time impl).
{quote}

I'm not sure we should do that (cut over associations later). The whole point 
about these features (associations, complements, sampling..) is that they are 
existing features. If we think they are useless / unneeded - that's one thing. 
But if we believe they are important, it's useless to make all the API changes 
without taking them into account, only to figure out later that we need 
abstraction X and Y in order to implement them.

And we make heavy use of associations, and some users asked (and use) sampling 
and I remember a question about complements. So obviously we cannot conclude 
that these are useless features. Therefore I think it's important that we try 
to tackle them now, so don't we don't do a full round trip to find ourselves 
with the same API again.

Can we do FacetIndexWriter in a separate issue (if we want to do it at all)? 
It's unrelated to the search API changes you want to do here, and it might be 
easier to contain within a single issue?

About CategoryListIterator ... what if we do manage to come up tomorrow with a 
better encoding strategy for facets. Do you really think that changing all 
existing WhateverFacets makes sense!? And if developers write their own 
WhateverFacets, it means they need to change their code too? Really, you're 
mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that 
there are apps that can benefit from a different encoding scheme (e.g. 
FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better 
default encoding scheme to introduce abstractions. I mean .. that's just sounds 
crazy to me.

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, 

[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct

2013-11-14 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822465#comment-13822465
 ] 

Elran Dvir commented on SOLR-5428:
--

Thanks, Shalin.
My use case requires 'distinctValues' alongside the other results, so I am 
afraid using LukeRequestHandler is not suitable.
In what way is it expensive? Is tere a way to improve it?
What do you mean when you say There should be a way to stop the collection ?

Thanks.

 new statistics results to StatsComponent - distinctValues and countDistinct
 ---

 Key: SOLR-5428
 URL: https://issues.apache.org/jira/browse/SOLR-5428
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5428.patch


 I thought it would be very useful to display the distinct values (and the 
 count) of a field among other statistics. Attached a patch implementing this 
 in StatsComponent.
 Added results  :
 distinctValues - list of all distnict values
 countDistinct -  distnict values count.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Tommaso Teofili
+1

 SUCCESS! [2:50:09.253204] ()

however we have the usual

 ***WARNING***: javadocs want to fail!

for Solr (which IMHO we should fix)

Regards,
Tommaso



2013/11/14 Shai Erera ser...@gmail.com

 Smoke tester passes for me. +1!

 Shai


 On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer 
 simon.willna...@gmail.com wrote:

 Please vote for the first Release Candidate for Lucene/Solr 4.6.0

 you can download it here:

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686

 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):

 python3.2 -u dev-tools/scripts/smokeTestRelease.py

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6


 I integrated the RC into Elasticsearch and all tests pass:


 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de

 Smoketester said: SUCCESS! [1:15:57.339272]

 here is my +1


 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct

2013-11-14 Thread Yago Riveiro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822470#comment-13822470
 ] 

Yago Riveiro commented on SOLR-5428:


Collect the distinctValues can be expensive but in my case is a requirement 
that Solr can't give me in a easy way. I need to do a facet query limit -1 to 
get all uniq terms that match the query.

If the StatsComponent can do the same thing, expensive or not, I vote to have 
the feature. The way how use it and the pros and cons of use it must be a 
decision made by the user.

 new statistics results to StatsComponent - distinctValues and countDistinct
 ---

 Key: SOLR-5428
 URL: https://issues.apache.org/jira/browse/SOLR-5428
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5428.patch


 I thought it would be very useful to display the distinct values (and the 
 count) of a field among other statistics. Attached a patch implementing this 
 in StatsComponent.
 Added results  :
 distinctValues - list of all distnict values
 countDistinct -  distnict values count.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct

2013-11-14 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822477#comment-13822477
 ] 

Elran Dvir commented on SOLR-5428:
--

Anthother thing I thout about:
my queries have q,fq and are distributed. Does LukeRequestHandler support it?

 new statistics results to StatsComponent - distinctValues and countDistinct
 ---

 Key: SOLR-5428
 URL: https://issues.apache.org/jira/browse/SOLR-5428
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5428.patch


 I thought it would be very useful to display the distinct values (and the 
 count) of a field among other statistics. Attached a patch implementing this 
 in StatsComponent.
 Added results  :
 distinctValues - list of all distnict values
 countDistinct -  distnict values count.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2013-11-14 Thread Anca Kopetz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822482#comment-13822482
 ] 

Anca Kopetz commented on SOLR-2649:
---

We need to apply Min should match for edismax query strings with operators 
(AND,OR) and mm parameter.
Therefore we developed our custom query parser.
The code is below, maybe it is useful for somebody who has the same 
requirements.
{code:title=CustomExtendedDismaxQParser.java}
public class CustomExtendedDismaxQParser extends ExtendedDismaxQParser {
   public CustomExtendedDismaxQParser(String qstr, SolrParams localParams, 
SolrParams params, SolrQueryRequest req) {
  super(qstr, localParams, params, req);
   }

   @Override
   protected Query parseOriginalQuery(ExtendedSolrQueryParser up, String 
mainUserQuery, ListClause clauses, ExtendedDismaxConfiguration config) {
  Query query = super.parseOriginalQuery(up, mainUserQuery, clauses, 
config);
  String mmValue = this.params.get(DisMaxParams.MM);
  if(!Strings.isNullOrEmpty(mmValue)) {
 if (query instanceof BooleanQuery) {
SolrPluginUtils.setMinShouldMatch((BooleanQuery)query, mmValue);
 }
  }
  return query;
   }
}
{code}

{code:title=solrconfig.xml}
queryParser name=kelkooEdismax 
class=com.kelkoo.search.solr.plugins.CustomExtendedDismaxQParserPlugin/
{code}

 MM ignored in edismax queries with operators
 

 Key: SOLR-2649
 URL: https://issues.apache.org/jira/browse/SOLR-2649
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Magnus Bergmark
Priority: Minor
 Fix For: 4.6


 Hypothetical scenario:
   1. User searches for stocks oil gold with MM set to 50%
   2. User adds -stockings to the query: stocks oil gold -stockings
   3. User gets no hits since MM was ignored and all terms where AND-ed 
 together
 The behavior seems to be intentional, although the reason why is never 
 explained:
   // For correct lucene queries, turn off mm processing if there
   // were explicit operators (except for AND).
   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
 (lines 232-234 taken from 
 tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
 This makes edismax unsuitable as an replacement to dismax; mm is one of the 
 primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Martijn v Groningen
+1! The smoker test succeeded.


On 14 November 2013 15:09, Tommaso Teofili tommaso.teof...@gmail.comwrote:

 +1

  SUCCESS! [2:50:09.253204] ()

 however we have the usual

  ***WARNING***: javadocs want to fail!

 for Solr (which IMHO we should fix)

 Regards,
 Tommaso



 2013/11/14 Shai Erera ser...@gmail.com

 Smoke tester passes for me. +1!

 Shai


 On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer 
 simon.willna...@gmail.com wrote:

 Please vote for the first Release Candidate for Lucene/Solr 4.6.0

 you can download it here:

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686

 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):

 python3.2 -u dev-tools/scripts/smokeTestRelease.py

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6


 I integrated the RC into Elasticsearch and all tests pass:


 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de

 Smoketester said: SUCCESS! [1:15:57.339272]

 here is my +1


 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org






-- 
Met vriendelijke groet,

Martijn van Groningen


[jira] [Created] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread JIRA
Rafał Kuć created SOLR-5441:
---

 Summary: Expose transaction log files number of their size via JMX
 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Priority: Minor


It may be useful to have the number of transaction log files and their overall 
size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822525#comment-13822525
 ] 

Rafał Kuć commented on SOLR-5441:
-

I'll provide patch later today.

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Priority: Minor

 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-11-14 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822528#comment-13822528
 ] 

Steve Rowe commented on LUCENE-5216:


bq. Committed to trunk. Will backport to 4x after I backport the main DV 
updates changes first.

[~shaie], looks like you included these changes in your branch_4x commit under 
LUCENE-5189, so this issue can be resolved?  FYI, when you merge multiple 
issues' commits, it's useful to include all issue numbers in the commit log 
message, so that they get auto-posted to the relevant JIRA issues.  That didn't 
happen here.

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
 Attachments: LUCENE-5216.patch


 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2013-11-14 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822537#comment-13822537
 ] 

Simon Rosenthal commented on SOLR-4722:
---

Great patch !

I'd like to use the code as the basis for a component which will simply return 
term positions for each query term - no need for having highlighting enabled as 
a prerequisite, or to return term offsets - this is a text mining project where 
we'll be running queries in batch mode and storing this information externally. 

Can you think of any gotchas I might encounter ?

 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, 5.0
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5215) Add support for FieldInfos generation

2013-11-14 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5215.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6

Committed to 4x under LUCENE-5189.

 Add support for FieldInfos generation
 -

 Key: LUCENE-5215
 URL: https://issues.apache.org/jira/browse/LUCENE-5215
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, 
 LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, 
 LUCENE-5215.patch


 In LUCENE-5189 we've identified few reasons to do that:
 # If you want to update docs' values of field 'foo', where 'foo' exists in 
 the index, but not in a specific segment (sparse DV), we cannot allow that 
 and have to throw a late UOE. If we could rewrite FieldInfos (with 
 generation), this would be possible since we'd also write a new generation of 
 FIS.
 # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the 
 consumer isn't allowed to change FI.attributes because we cannot modify the 
 existing FIS. This is implicit however, and we silently ignore any modified 
 attributes. FieldInfos.gen will allow that too.
 The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and 
 add support for FIS generation in FieldInfosFormat, SegReader etc., like we 
 now do for DocValues. I'll work on a patch.
 Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that 
 have same limitation -- if a Codec modifies them, they are silently being 
 ignored, since we don't gen the .si files. I think we can easily solve that 
 by recording SI.attributes in SegmentInfos, so they are recorded per-commit. 
 But I think it should be handled in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-11-14 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5216.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6
 Assignee: Shai Erera
Lucene Fields: New,Patch Available  (was: New)

Committed to 4x under LUCENE-5189.

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5216.patch


 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

2013-11-14 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5248.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6

Committed to 4x under LUCENE-5189.

 Improve the data structure used in ReaderAndLiveDocs to hold the updates
 

 Key: LUCENE-5248
 URL: https://issues.apache.org/jira/browse/LUCENE-5248
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, 
 LUCENE-5248.patch, LUCENE-5248.patch


 Currently ReaderAndLiveDocs holds the updates in two structures:
 +MapString,MapInteger,Long+
 Holds a mapping from each field, to all docs that were updated and their 
 values. This structure is updated when applyDeletes is called, and needs to 
 satisfy several requirements:
 # Un-ordered writes: if a field f is updated by two terms, termA and termB, 
 in that order, and termA affects doc=100 and termB doc=2, then the updates 
 are applied in that order, meaning we cannot rely on updates coming in order.
 # Same document may be updated multiple times, either by same term (e.g. 
 several calls to IW.updateNDV) or by different terms. Last update wins.
 # Sequential read: when writing the updates to the Directory 
 (fieldsConsumer), we iterate on the docs in-order and for each one check if 
 it's updated and if not, pull its value from the current DV.
 # A single update may affect several million documents, therefore need to be 
 efficient w.r.t. memory consumption.
 +MapInteger,MapString,Long+
 Holds a mapping from a document, to all the fields that it was updated in and 
 the updated value for each field. This is used by IW.commitMergedDeletes to 
 apply the updates that came in while the segment was merging. The 
 requirements this structure needs to satisfy are:
 # Access in doc order: this is how commitMergedDeletes works.
 # One-pass: we visit a document once (currently) and so if we can, it's 
 better if we know all the fields in which it was updated. The updates are 
 applied to the merged ReaderAndLiveDocs (where they are stored in the first 
 structure mentioned above).
 Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5246) SegmentInfoPerCommit continues to list unneeded updatesFiles

2013-11-14 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5246.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6

Committed to 4x under LUCENE-5189.

 SegmentInfoPerCommit continues to list unneeded updatesFiles
 

 Key: LUCENE-5246
 URL: https://issues.apache.org/jira/browse/LUCENE-5246
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5246.patch, LUCENE-5246.patch


 SegmentInfoPerCommit continues to list updates files even if they are 
 unneeded anymore. For example, if you update the values of documents of field 
 'f', it creates a gen'd .fnm (FieldInfos) file. If you commit/reopen and 
 update the field again (maybe now a different set of documents), it creates 
 another gen'd .fnm, but continues to list both gens, even though only the 
 latest one is needed.
 To solve this, SIPC would need to know then dvGen of each FieldInfo, so that 
 it can correctly list only the updates files that are truly needed. I'll work 
 on a testcase and fix.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-11-14 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822558#comment-13822558
 ] 

Steve Rowe commented on LUCENE-5216:


bq. Oh, I didn't know we can do that! So I'd do svn ci -m LUCENE-123 
LUCENE-234: message? Do they need to be separated by comma or something?

I'm pretty sure there is no required punctuation - AFAIK any svn commit log 
message matching regex /PROJECT-\d+/ *anywhere in the log message* gets added 
as a comment to the corresponding JIRA issue, and multiple issue mentions 
result in comment addition to all of them.

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5216.patch


 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-11-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822561#comment-13822561
 ] 

Shai Erera commented on LUCENE-5216:


Thanks Steve, will try to keep that in mind. I always thought we must format 
the messages like Lucene-1234: message, i.e. PROJECT-number even followed by 
colon.

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5216.patch


 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-11-14 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822564#comment-13822564
 ] 

Steve Rowe commented on LUCENE-5216:


I think that may once have been true, maybe for early versions of Mark Miller's 
service?, but is no longer.

Here's a recent example showing no punctuation required, and multiple issues: 
https://issues.apache.org/jira/browse/LUCENE-5217?focusedCommentId=13820840page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820840

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5216.patch


 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved

2013-11-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822567#comment-13822567
 ] 

Shai Erera commented on LUCENE-5216:


Cool!

 Fix SegmentInfo.attributes when updates are involved
 

 Key: LUCENE-5216
 URL: https://issues.apache.org/jira/browse/LUCENE-5216
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5216.patch


 Today, SegmentInfo.attributes are write-once. However, in the presence of 
 field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in 
 which if a Codec decides to alter the attributes when updates are applied, 
 they are silently discarded. This is rather a corner case, though one that 
 should be addressed.
 There were two solutions to address this:
 # Record SI.attributes in SegmentInfos, so they are written per-commit, 
 instead of the .si file.
 # Remove them altogether, as they don't seem to be used anywhere in Lucene 
 code today.
 If we remove them, we basically don't take away special capability from 
 Codecs, because they can still write the attributes to a separate file, or 
 even the file they record the other data in. This will work even with 
 updates, as long as Codecs respect the given segmentSuffix.
 If we keep them, I think the simplest solution is to read/write them by 
 SegmentInfos. But if we don't see a good use case, I suggest we remove them, 
 as it's just extra code to maintain. I think we can even risk a backwards 
 break and remove them completely from 4x, though if that's a problem, we can 
 deprecate too.
 If anyone sees a good usage for them, or better - already uses them, please 
 speak up, so we can make the proper decision.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-11-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822572#comment-13822572
 ] 

Erick Erickson commented on SOLR-5287:
--

Stefan:

Nope, you're right on time. I'll be around this weekend too, although in a 
different time zone...

 Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
 --

 Key: SOLR-5287
 URL: https://issues.apache.org/jira/browse/SOLR-5287
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5287.patch, SOLR-5287.patch, SOLR-5287.patch


 A user asking a question on the Solr list got me to thinking about editing 
 the main config files from the Solr admin screen. I chatted briefly with 
 [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
 problem on that end. His comment is there's no end point that'll write the 
 file back.
 Am I missing something here or is this actually not a hard problem? I see a 
 couple of issues off the bat, neither of which seem troublesome.
 1 file permissions. I'd imagine lots of installations will get file 
 permission exceptions if Solr tries to write the file out. Well, do a 
 chmod/chown.
 2 screwing up the system maliciously or not. I don't think this is an issue, 
 this would be part of the admin handler after all.
 Does anyone have objections to the idea? And how does this fit into the work 
 that [~sar...@syr.edu] has been doing?
 I can imagine this extending to SolrCloud with a push this to ZK option or 
 something like that, perhaps not in V1 unless it's easy.
 Of course any pointers gratefully received. Especially ones that start with 
 Don't waste your effort, it'll never work (or be accepted)...
 Because what scares me is this seems like such an easy thing to do that would 
 be a significant ease-of-use improvement, so there _has_ to be something I'm 
 missing.
 So if we go forward with this we'll make this the umbrella JIRA, the two 
 immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
 for the UI work to use.
 I think there are only two end-points here
 1 list all the files in the conf (or arbitrary from solr_home/collection) 
 directory.
 2 write this text to this file
 Possibly later we could add clone the configs from coreX to coreY.
 BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
 take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2724) Deprecate defaultSearchField and defaultOperator defined in schema.xml

2013-11-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822586#comment-13822586
 ] 

David Smiley commented on SOLR-2724:


Anca,
Yeah, I know.  It is deprecated but still available.

 Deprecate defaultSearchField and defaultOperator defined in schema.xml
 --

 Key: SOLR-2724
 URL: https://issues.apache.org/jira/browse/SOLR-2724
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, search
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 3.6, 4.0-ALPHA

 Attachments: 
 SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch, 
 SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 I've always been surprised to see the defaultSearchField element and 
 solrQueryParser defaultOperator=OR/ defined in the schema.xml file since 
 the first time I saw them.  They just seem out of place to me since they are 
 more query parser related than schema related. But not only are they 
 misplaced, I feel they shouldn't exist. For query parsers, we already have a 
 df parameter that works just fine, and explicit field references. And the 
 default lucene query operator should stay at OR -- if a particular query 
 wants different behavior then use q.op or simply use OR.
 similarity Seems like something better placed in solrconfig.xml than in the 
 schema. 
 In my opinion, defaultSearchField and defaultOperator configuration elements 
 should be deprecated in Solr 3.x and removed in Solr 4.  And similarity 
 should move to solrconfig.xml. I am willing to do it, provided there is 
 consensus on it of course.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822597#comment-13822597
 ] 

Shalin Shekhar Mangar commented on SOLR-5428:
-

You're both right. We can't replace this functionality with LukeRequestHandler. 
At the same time, forcing everyone to keep a set of distinct values in memory, 
when someone just needs min, max or count is bad.

 new statistics results to StatsComponent - distinctValues and countDistinct
 ---

 Key: SOLR-5428
 URL: https://issues.apache.org/jira/browse/SOLR-5428
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5428.patch


 I thought it would be very useful to display the distinct values (and the 
 count) of a field among other statistics. Attached a patch implementing this 
 in StatsComponent.
 Added results  :
 distinctValues - list of all distnict values
 countDistinct -  distnict values count.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-5441:
---

Assignee: Shalin Shekhar Mangar

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor

 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafał Kuć updated SOLR-5441:


Attachment: SOLR-5441.patch

Please look at the attached patch and see if this is the right approach and if 
anything should be modified.

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0-ea-b114) - Build # 3465 - Still Failing!

2013-11-14 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3465/
Java: 32bit/jdk1.8.0-ea-b114 -server -XX:+UseParallelGC

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.component.DistributedDebugComponentTest

Error Message:
Unable to delete file: 
.\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000

Stack Trace:
java.io.IOException: Unable to delete file: 
.\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000
at __randomizedtesting.SeedInfo.seed([D76932B7DFD5F27B]:0)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1919)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910)
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331)
at 
org.apache.solr.SolrJettyTestBase.cleanUpJettyHome(SolrJettyTestBase.java:189)
at 
org.apache.solr.handler.component.DistributedDebugComponentTest.afterTest(DistributedDebugComponentTest.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:700)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:744)




Build Log:
[...truncated 10841 lines...]
   [junit4] Suite: 
org.apache.solr.handler.component.DistributedDebugComponentTest
   [junit4]   2 1897034 T7117 oas.SolrTestCaseJ4.initCore initCore
   [junit4]   2 Creating dataDir: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\.\solrtest-DistributedDebugComponentTest-1384448216339
   [junit4]   2 1897036 T7117 oas.SolrTestCaseJ4.initCore initCore end
   [junit4]   2 1897036 T7117 oas.SolrJettyTestBase.getSSLConfig Randomized 
ssl (true) and clientAuth (false)
   [junit4]   2 1897036 T7117 oejs.Server.doStart jetty-8.1.10.v20130312
   [junit4]   2 1897040 T7117 oejs.AbstractConnector.doStart Started 
SelectChannelConnector@127.0.0.1:58776
   [junit4]   2 1897041 T7117 oass.SolrDispatchFilter.init 
SolrDispatchFilter.init()
   [junit4]   2 1897041 T7117 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for 

[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct

2013-11-14 Thread Yago Riveiro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822612#comment-13822612
 ] 

Yago Riveiro commented on SOLR-5428:


Ok, I forgot that the StatsComponent return all metrics in one call.

Maybe the StatsCompement needs some tweaking to only return the metrics that we 
need and not all. If the analytics component could working with distributed 
searchs this patch would not necessary.

 new statistics results to StatsComponent - distinctValues and countDistinct
 ---

 Key: SOLR-5428
 URL: https://issues.apache.org/jira/browse/SOLR-5428
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
Assignee: Shalin Shekhar Mangar
 Attachments: SOLR-5428.patch


 I thought it would be very useful to display the distinct values (and the 
 count) of a field among other statistics. Attached a patch implementing this 
 in StatsComponent.
 Added results  :
 distinctValues - list of all distnict values
 countDistinct -  distnict values count.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822619#comment-13822619
 ] 

Shalin Shekhar Mangar commented on SOLR-5441:
-

Thanks Rafał. We need to put the iteration in a synchronized block.

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Adrien Grand
+1 from my side as well. Release candidate looks good and smoke tester
was happy.

On Thu, Nov 14, 2013 at 4:17 PM, Martijn v Groningen
martijn.v.gronin...@gmail.com wrote:
 +1! The smoker test succeeded.


 On 14 November 2013 15:09, Tommaso Teofili tommaso.teof...@gmail.com
 wrote:

 +1

  SUCCESS! [2:50:09.253204] ()

 however we have the usual

  ***WARNING***: javadocs want to fail!

 for Solr (which IMHO we should fix)

 Regards,
 Tommaso



 2013/11/14 Shai Erera ser...@gmail.com

 Smoke tester passes for me. +1!

 Shai


 On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer
 simon.willna...@gmail.com wrote:

 Please vote for the first Release Candidate for Lucene/Solr 4.6.0

 you can download it here:

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686

 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):

 python3.2 -u dev-tools/scripts/smokeTestRelease.py

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6


 I integrated the RC into Elasticsearch and all tests pass:


 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de

 Smoketester said: SUCCESS! [1:15:57.339272]

 here is my +1


 Simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org






 --
 Met vriendelijke groet,

 Martijn van Groningen



-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2013-11-14 Thread Tricia Jenkins (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822636#comment-13822636
 ] 

Tricia Jenkins commented on SOLR-4722:
--

Thanks for your interest.  This code/jar could be used as is for your purposes.

If you don't want to specify highlighting enabled in each query just move it to 
conf/solrconfig.xml:
{code:xml}
  requestHandler name=standard class=solr.StandardRequestHandler
lst name=defaults
  str name=hltrue/str
/lst
  /requestHandler
{code}

This highlighter only returns the term positions.  The term offsets are stored 
because they're used by the FastVectorHighlighter.  You won't get any useful 
information from this highlighter if you disable termOffsets in your schema.xml.

I just ran this patch against trunk.  Still works!

 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, 5.0
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests

2013-11-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822644#comment-13822644
 ] 

Tomás Fernández Löbbe commented on SOLR-5399:
-

I think the problem is that the test tries to delete the solr home before 
stopping Jetty. I'm testing a fix now

 Improve DebugComponent for distributed requests
 ---

 Key: SOLR-5399
 URL: https://issues.apache.org/jira/browse/SOLR-5399
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Ryan Ernst
 Fix For: 5.0, 4.7

 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch


 I'm working on extending the DebugComponent for adding some useful 
 information to be able to track distributed requests better. I'm adding two 
 different things, first, the request can generate a request ID that will be 
 printed in the logs for the main query and all the different internal 
 requests to the different shards. This should make it easier to find the 
 different parts of a single user request in the logs. It would also add the 
 purpose of each internal request to the logs, like: 
 RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. 
 Also, I'm adding a track section to the debug info where to add information 
 about the different phases of the distributed request (right now, I'm only 
 including QTime, but could eventually include more information) like: 
 {code:xml}
 lst name=debug
 lst name=track
 lst name=EXECUTE_QUERY
 str name=localhost:8985/solrQTime: 10/str
 str name=localhost:8984/solrQTime: 25/str
 /lst
 lst name=GET_FIELDS
 str name=localhost:8985/solrQTime: 1/str
 /lst
 /lst
 /lst
 {code}
 To get this, debugQuery must be set to true, or debug must include 
 debug=track. This information is only added to distributed requests.  I 
 would like to get feedback on this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafał Kuć updated SOLR-5441:


Attachment: SOLR-5441-synchronized.patch

Added sychronized on logs list in the UpdateLog class.

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822696#comment-13822696
 ] 

ASF subversion and git services commented on SOLR-5441:
---

Commit 1541999 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1541999 ]

SOLR-5441: Expose number of transaction log files and their size via JMX

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822697#comment-13822697
 ] 

ASF subversion and git services commented on SOLR-5441:
---

Commit 1542000 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1542000 ]

SOLR-5441: Expose number of transaction log files and their size via JMX

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5441) Expose transaction log files number of their size via JMX

2013-11-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-5441.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.6

Thanks Rafał!

 Expose transaction log files number of their size via JMX
 -

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Steve Rowe
-1

Smoke tester passes.

Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” 
follows “Detailed Change List”, but should be above it; and one change 
attribution didn’t get recognized because it’s missing parens: Elran Dvir via 
Erick Erickson.  Definitely not worth a respin in either case.

Lucene Changes look good, except that the “API Changes” section in Changes.html 
is formatted as an item in the “Bug Fixes” section, rather than its own 
section.  I’ll fix.  (The issue is that “API Changes:” in CHANGES.txt has a 
trailing colon - the section name regex should allow this.)  This is probably 
not worth a respin.

Lucene and Solr Documentation pages look good, except that the File Formats” 
link from the Lucene Documentation page leads to the 4.5 format doc, rather 
than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215).  This is 
respin-worthy.  Updating this is not automated now - it’s hard-coded in 
lucene/site/xsl/index.xsl - the default codec doesn’t change in every release.  
I’ll try to automate extracting the default from 
o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)].

Lucene and Solr Javadocs look good.

Steve

On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote:

 Please vote for the first Release Candidate for Lucene/Solr 4.6.0
 
 you can download it here:
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 
 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):
 
 python3.2 -u dev-tools/scripts/smokeTestRelease.py
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6
 
 
 I integrated the RC into Elasticsearch and all tests pass:
 
 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de
 
 Smoketester said: SUCCESS! [1:15:57.339272]
 
 here is my +1
 
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822721#comment-13822721
 ] 

Michael McCandless commented on LUCENE-5339:


Thanks for the feedback Shai.

bq. I think FacetField should an optional ctor taking the indexedFacetField, 
defaulting to $facets, then the ctor calls super() with the right field, and 
not dummy? And you can remove set/get?

I moved the indexed field name to the DimConfig.

bq. SimpleFacetsCollector jdocs are wrong – there's no create?

I removed it and put nocommit.

bq. Do we still need SSDVFacetFields?

Woops, no; I removed it.

bq. I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes 
me think if down the road we won't be sorry about doing this change (i.e. if 
anything changes on IW APIs). The good thing about FacetFields is that it just 
adds fields to a Document, and doesn't worry about IW API at all...

I think that's an acceptable risk in exchange for the simpler
index-time API.

bq. DimType == DimConfig . Sorry if I wasn't clear somewhere in my long 
response.

Ahh, OK.  I'm still wondering if we should put the facetMethod onto
the DimConfig...

bq. Like if your facet has 2 levels, that's 33% more ords. I think the count of 
the root ord is most likely never needed? And if it's needed, app can compute 
it by traversing its children and their values in the facet arrays? Maybe as a 
default we just never index it, and don't add a vague 
requiresDimCount/Value/Weight boolean?

Wait, the app cannot compute this (accurately) by summing the child counts?  It 
will overcount in general, right?

{quote}
bq. I think replicating bits of this code for the different methods is the 
lesser evil than adding so many abstractions that the module is hard to 
approach by new devs/users.

Did we ever get such feedback from users? That the module is unapproachable?
{quote}

It's mostly from my own assessment, looking at the code, and my own
frustrations over time in trying to improve the facets module;
LUCENE-5333 was finally the straw (for me)... I find complex APIs
frustrating and I think it's a serious barrier to new contributors
getting involved and users consuming them.

There was a user on the ML (I don't have the link) who just wanted to
get the facet count for a specific label after faceting was done, and
the hoops s/he had to jump through (custom FacetResultHandler I
think??) to achieve that was just crazy.

bq. I get the opposite feedback - that we don't have many abstractions! 

Seriously?  What abstractions are we missing?

If this is too much change for the facets module, we could, instead,
leave the facets module as is, and break this effort out as a
different module (facets2, simplefacets, something?).  We also have
many queryparsers, many highlighters, etc., and I think that's
healthy: all options can be explored.

bq. You have a nocommit maybe we should do this lazily in regards for when to 
rollupValues. That shows me that now every developer who extends this API 
(let's be clear - users are oblivious to this happening) will facet the same 
decision (nocommit). If we discover one day that it's better to rollup lazily 
or not, other developers don't benefit from that decision. That's why I think 
some abstractions are good.

It's a crazy expert thing to create another faceting impl, so I think
such developers can handle changing their rollup to be lazy if it's
beneficial/necessary for their use case.

bq. I'm not sure we should do that (cut over associations later). The whole 
point about these features (associations, complements, sampling..) is that they 
are existing features. If we think they are useless / unneeded - that's one 
thing. But if we believe they are important, it's useless to make all the API 
changes without taking them into account, only to figure out later that we need 
abstraction X and Y in order to implement them.

Well this could be a good argument for just making a new module?

The new module would have a simpler API and less thorough
functionality?

bq. Can we do FacetIndexWriter in a separate issue (if we want to do it at 
all)? It's unrelated to the search API changes you want to do here, and it 
might be easier to contain within a single issue?

I'm not sure it's so easily separated out; the DimConfig is common to
index time and search time, and we're still iterating on that (I just
moved the indexedFieldName into it).

bq. About CategoryListIterator ... what if we do manage to come up tomorrow 
with a better encoding strategy for facets. Do you really think that changing 
all existing WhateverFacets makes sense!? And if developers write their own 
WhateverFacets, it means they need to change their code too? Really, you're 
mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that 
there are apps that can benefit from a different encoding scheme (e.g. 

[jira] [Commented] (SOLR-5402) SolrCloud 4.5 bulk add errors in cloud setup

2013-11-14 Thread Greg Walters (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822728#comment-13822728
 ] 

Greg Walters commented on SOLR-5402:


I've been able to reproduce this issue using curl to add documents but using 
the post.jar provided in the Solr example or using a solrj client I'm unable to 
reproduce this issue having tried batches up to 5000 documents.

 SolrCloud 4.5 bulk add errors in cloud setup
 

 Key: SOLR-5402
 URL: https://issues.apache.org/jira/browse/SOLR-5402
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 4.5.1
Reporter: Sai Gadde
 Fix For: 4.6


 We use out of the box Solr 4.5.1 no customization done. If we merge documents 
 via SolrJ to a single server it is perfectly working fine.
 But as soon as we add another node to the cloud we are getting following 
 while merging documents. We merge about 500 at a time using SolrJ. These 500 
 documents in total are about few MB (1-3) in size.
 This is the error we are getting on the server (10.10.10.116 - IP is 
 irrelavent just for clarity)where merging is happening. 10.10.10.119 is the 
 new node here. This server gets RemoteSolrException
 shard update error StdNode: 
 http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
  Illegal to have multiple roots (start tag in epilog?).
  at [row,col {unknown-source}]: [1,12468]
   at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425)
   at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
   at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
   at 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 On the other server 10.10.10.119 we get following error
 org.apache.solr.common.SolrException: Illegal to have multiple roots (start 
 tag in epilog?).
  at [row,col {unknown-source}]: [1,12468]
   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple 
 roots (start tag in epilog?).
  at [row,col 

[jira] [Created] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl

2013-11-14 Thread Steve Rowe (JIRA)
Steve Rowe created LUCENE-5341:
--

 Summary: Generated documentation should link to the default 
codec's index format documentation, rather than being hard coded in 
lucene/site/xsl/index.xsl
 Key: LUCENE-5341
 URL: https://issues.apache.org/jira/browse/LUCENE-5341
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6, 5.0, 4.7


In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation 
page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec 
was introduced by LUCENE-5215). 

Updating this is not automated now - it’s hard-coded in 
{{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every 
release.

The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = 
Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format 
documentation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5381) Split Clusterstate and scale

2013-11-14 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809096#comment-13809096
 ] 

Noble Paul edited comment on SOLR-5381 at 11/14/13 6:42 PM:


OK ,
here is the plan to split clusterstate on a per collection basis

h2. How to use this feature?
Introduce a new option while creating a collection (external=true) . This will 
keep the state of the collection in a separate node. 
example :

http://localhost:8983/solr/admin/collections?action=CREATEname=xcollnumShards=5replicationFactor=2external=true

This will result in this following entry in clusterstate.json
{code:JavaScript}
{
 “xcoll” : {“ex”:true}
}
{code}
there will be another ZK entry which carries the actual collection information
*  /collections
** /xcoll
*** /state.json
{code:JavaScript}
{xcoll:{
shards:{shard1:{
range:”8000-b332”l,
state:active,
replicas:{
   core_node1:{
  state:active,
  base_url:http://192.168.1.5:8983/solr;,
   core:xcoll_shard1_replica1,
node_name:192.168.1.5:8983_solr,
leader:true,
router:{name:compositeId}}}
{code}

The main Overseer thread is responsible for creating collections and managing 
all the events for all the collections in the clusterstate.json . 
clusterstate.json is modified only when a collection is created/deleted or when 
state updates happen to “non-external” collections

Each external collection to have its own Overseer queue as follows. There will 
be a separate thread for each external collection.  

* /collections
** /xcoll
*** /overseer
 /collection-queue-work
 /queue
  /queue-work


h2. SolrJ enhancements
SolrJ would not listen to any ZK node. When a request comes for a collection 
‘xcoll’
* it would first check if such a collection exists
* If yes it first looks up the details in the local cache for that collection 
* If not found in cache , it fetches the node /collections/xcoll/state.json and 
caches the information 
* Any query/update will be sent with extra query param specifying the 
collection name , shard name, Role (Leader/Replica), and range (example 
\_target_=xcoll:shard1:L:8000-b332) . A node would throw an error 
(INVALID_NODE) if it does not the serve the collection/shard/Role/range combo.
* If a SolrJ gets INVALID_NODE error it  would invalidate the cache and fetch 
fresh state information for that collection (and caches it again).

h2. Changes to each Solr Node
Each node would only listen to the clusterstate.json and the states of 
collections which it is a member of. If a request comes for a collection it 
does not serve, it first checks for the \_target_ param. All collections 
present in the clusterstate.json will be deemed as collections it serves
* If the param is present and the node does not serve that 
collection/shard/Role/Range combo, an INVALID_NODE error is thrown
** If the validation succeeds it is served 
* If the param is not present and the node is a member of the collection, the 
request is served by 
** If the node is not a member of the collection,  it uses SolrJ to proxy the 
request to appropriate location

Internally , the node really does not care about the state of external 
collections. If/when it is required, the information is fetched real time from 
ZK and used and thrown away.

h2. Changes to admin GUI
External collections are not shown graphically in the admin UI . 




was (Author: noble.paul):
OK ,
here is the plan to split clusterstate on a per collection basis

h2. How to use this feature?
Introduce a new option while creating a collection (external=true) . This will 
keep the state of the collection in a separate node. 
example :

http://localhost:8983/solr/admin/collections?action=CREATEname=xcollnumShards=5replicationFactor=2external=true

This will result in this following entry in clusterstate.json
{code:JavaScript}
{
 “xcoll” : {“ex”:true}
}
{code}
there will be another ZK entry which carries the actual collection information
*  /collections
** /xcoll
*** /state.json
{code:JavaScript}
{xcoll:{
shards:{shard1:{
range:”8000-b332”l,
state:active,
replicas:{
   core_node1:{
  state:active,
  base_url:http://192.168.1.5:8983/solr;,
   core:xcoll_shard1_replica1,
node_name:192.168.1.5:8983_solr,
leader:true,
router:{name:compositeId}}}
{code}

The main Overseer thread is responsible for creating collections and managing 
all the events for all the collections in the clusterstate.json . 
clusterstate.json is modified only when a collection is created/deleted or when 
state updates happen to “non-external” collections

Each external collection to have its own Overseer queue as follows. There will 
be a separate thread for each external 

[jira] [Updated] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl

2013-11-14 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-5341:
---

Attachment: LUCENE-5341.patch

Patch automating default codec extraction for use in the URL to index file 
format documentation.

I'll commit shortly.

 Generated documentation should link to the default codec's index format 
 documentation, rather than being hard coded in lucene/site/xsl/index.xsl
 

 Key: LUCENE-5341
 URL: https://issues.apache.org/jira/browse/LUCENE-5341
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6, 5.0, 4.7

 Attachments: LUCENE-5341.patch


 In the 4.6 RC1, The File Formats” link from the generated Lucene 
 Documentation page leads to the 4.5 format doc, rather than the 4.6 format 
 doc (Lucene46Codec was introduced by LUCENE-5215). 
 Updating this is not automated now - it’s hard-coded in 
 {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every 
 release.
 The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ 
 = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file 
 format documentation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822755#comment-13822755
 ] 

ASF subversion and git services commented on LUCENE-5341:
-

Commit 1542012 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1542012 ]

LUCENE-5341: automate default codec package extraction for use in the generated 
Lucene documentation's link to the index file format documentation

 Generated documentation should link to the default codec's index format 
 documentation, rather than being hard coded in lucene/site/xsl/index.xsl
 

 Key: LUCENE-5341
 URL: https://issues.apache.org/jira/browse/LUCENE-5341
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6, 5.0, 4.7

 Attachments: LUCENE-5341.patch


 In the 4.6 RC1, The File Formats” link from the generated Lucene 
 Documentation page leads to the 4.5 format doc, rather than the 4.6 format 
 doc (Lucene46Codec was introduced by LUCENE-5215). 
 Updating this is not automated now - it’s hard-coded in 
 {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every 
 release.
 The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ 
 = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file 
 format documentation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822762#comment-13822762
 ] 

ASF subversion and git services commented on LUCENE-5341:
-

Commit 1542013 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1542013 ]

LUCENE-5341: automate default codec package extraction for use in the generated 
Lucene documentation's link to the index file format documentation (merged 
trunk r1542012)

 Generated documentation should link to the default codec's index format 
 documentation, rather than being hard coded in lucene/site/xsl/index.xsl
 

 Key: LUCENE-5341
 URL: https://issues.apache.org/jira/browse/LUCENE-5341
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6, 5.0, 4.7

 Attachments: LUCENE-5341.patch


 In the 4.6 RC1, The File Formats” link from the generated Lucene 
 Documentation page leads to the 4.5 format doc, rather than the 4.6 format 
 doc (Lucene46Codec was introduced by LUCENE-5215). 
 Updating this is not automated now - it’s hard-coded in 
 {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every 
 release.
 The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ 
 = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file 
 format documentation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: DocValue on Strings slow and OOM

2013-11-14 Thread Joel Bernstein
Per,

As you are seeing there are different implementations for calculating
facets for numeric fields and string fields. The numeric fields I believe
are using an int-to-int or long-to-int hashmap to hold the facet counts.
This map grows as values are added to it. The String version uses an int
array the size of the number of distinct values in the field to hold the
facet counts. So if you have a very large number of distinct values in the
field, you'll have a very large array. Also the distinct values themselves
are held in memory in the fieldCache for string fields.

So, basically as you are seeing you'll take up a much larger memory
footprint when when faceting on a high cardinality string field, then on a
high cardinality numeric field.

There are docvalues faceting implementations that will kick-in on a field
that has docvalues. You can try setting the on disk flag and this will test
memory and performance.

Joel

Joel




On Thu, Nov 14, 2013 at 8:13 AM, Per Steffensen st...@designware.dk wrote:

  If anyone if following this one, just an update. We are not going to
 upgrade to 4.5.1 in order to see if the String facet performance problem
 has been fixed. Instead we have made a few hacks around our data so that we
 can store the c-field (c_dstr_doc_sto) as long instead (c_dlng_doc_sto). So
 now we only need to struggle with long-facet performance. There is a
 performance issue with facets on longs though, but I will tell about in
 another mailing-thread - need your input on what solution you prefer.

 Regards, Per Steffensen


 On 11/6/13 12:15 PM, Per Steffensen wrote:

 On 11/6/13 11:43 AM, Robert Muir wrote:

 Before lucene 4.5 docvalues were loaded entirely into RAM.

 I'm not going to waste time debugging any old code releases here, you
 should upgrade to the latest release!

  Ok, thanks!

 I do not consider it a bug (just a performance issue), so no debugging
 needed.
 It is just that we do not want to spend time upgrading to 4.5 if there is
 not a justified hope/explanation that it will probably make things
 better. But I guess there is.

 One short question: Will 4.5 index things differently (compared to 4.4)
 for documents with fields like I showed earlier? Im basically asking if we
 need to reindex the 12billion documents again after upgrading to 4.5, or if
 we ought to be able to deploy 4.5 on top of the already indexed documents.

 Regards, Per Steffensen





-- 
Joel Bernstein
Search Engineer at Heliosearch


[jira] [Commented] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822770#comment-13822770
 ] 

ASF subversion and git services commented on LUCENE-5341:
-

Commit 1542018 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_6'
[ https://svn.apache.org/r1542018 ]

LUCENE-5341: automate default codec package extraction for use in the generated 
Lucene documentation's link to the index file format documentation (merged 
branch_4x r1542013)

 Generated documentation should link to the default codec's index format 
 documentation, rather than being hard coded in lucene/site/xsl/index.xsl
 

 Key: LUCENE-5341
 URL: https://issues.apache.org/jira/browse/LUCENE-5341
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6, 5.0, 4.7

 Attachments: LUCENE-5341.patch


 In the 4.6 RC1, The File Formats” link from the generated Lucene 
 Documentation page leads to the 4.5 format doc, rather than the 4.6 format 
 doc (Lucene46Codec was introduced by LUCENE-5215). 
 Updating this is not automated now - it’s hard-coded in 
 {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every 
 release.
 The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ 
 = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file 
 format documentation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822778#comment-13822778
 ] 

ASF subversion and git services commented on LUCENE-5339:
-

Commit 1542025 from [~mikemccand] in branch 'dev/branches/lucene5339'
[ https://svn.apache.org/r1542025 ]

LUCENE-5339: current patch

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822780#comment-13822780
 ] 

Michael McCandless commented on LUCENE-5339:


OK I created a branch at 
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5339 and committed 
my current patch (same as last patch I think, except I moved the 
indexedFieldName from FacetField to DimConfig).

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Steve Rowe
I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, for 
all the problems I mentioned.  

The first revision including all these is 1542030.

Steve

On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote:

 -1
 
 Smoke tester passes.
 
 Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” 
 follows “Detailed Change List”, but should be above it; and one change 
 attribution didn’t get recognized because it’s missing parens: Elran Dvir 
 via Erick Erickson.  Definitely not worth a respin in either case.
 
 Lucene Changes look good, except that the “API Changes” section in 
 Changes.html is formatted as an item in the “Bug Fixes” section, rather than 
 its own section.  I’ll fix.  (The issue is that “API Changes:” in CHANGES.txt 
 has a trailing colon - the section name regex should allow this.)  This is 
 probably not worth a respin.
 
 Lucene and Solr Documentation pages look good, except that the File Formats” 
 link from the Lucene Documentation page leads to the 4.5 format doc, rather 
 than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215).  This 
 is respin-worthy.  Updating this is not automated now - it’s hard-coded in 
 lucene/site/xsl/index.xsl - the default codec doesn’t change in every 
 release.  I’ll try to automate extracting the default from 
 o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)].
 
 Lucene and Solr Javadocs look good.
 
 Steve
 
 On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com 
 wrote:
 
 Please vote for the first Release Candidate for Lucene/Solr 4.6.0
 
 you can download it here:
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 
 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):
 
 python3.2 -u dev-tools/scripts/smokeTestRelease.py
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6
 
 
 I integrated the RC into Elasticsearch and all tests pass:
 
 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de
 
 Smoketester said: SUCCESS! [1:15:57.339272]
 
 here is my +1
 
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822776#comment-13822776
 ] 

ASF subversion and git services commented on LUCENE-5339:
-

Commit 1542023 from [~mikemccand] in branch 'dev/branches/lucene5339'
[ https://svn.apache.org/r1542023 ]

LUCENE-5339: make branch

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl

2013-11-14 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-5341.


Resolution: Fixed

Committed to trunk, branch_4x, and lucene_solr_4_6.

 Generated documentation should link to the default codec's index format 
 documentation, rather than being hard coded in lucene/site/xsl/index.xsl
 

 Key: LUCENE-5341
 URL: https://issues.apache.org/jira/browse/LUCENE-5341
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
 Fix For: 4.6, 5.0, 4.7

 Attachments: LUCENE-5341.patch


 In the 4.6 RC1, The File Formats” link from the generated Lucene 
 Documentation page leads to the 4.5 format doc, rather than the 4.6 format 
 doc (Lucene46Codec was introduced by LUCENE-5215). 
 Updating this is not automated now - it’s hard-coded in 
 {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every 
 release.
 The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ 
 = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file 
 format documentation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5329) Make DocumentDictionary and co more lenient to dirty documents

2013-11-14 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5329:
-

Attachment: LUCENE-5329.patch

Updated patch:
  - fixed documentation-lint issue (added queries javadoc in build)
  - changed # of docs generated in tests to make it a little faster

 Make DocumentDictionary and co more lenient to dirty documents
 --

 Key: LUCENE-5329
 URL: https://issues.apache.org/jira/browse/LUCENE-5329
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5329.patch, LUCENE-5329.patch, LUCENE-5329.patch


 Currently DocumentDictionary errors out whenever any document does not have 
 value for any relevant stored fields. It would be nice to make it lenient and 
 instead ignore the invalid documents.
 Another issue with the DocumentDictionary is that it only allows string 
 fields as suggestions and binary fields as payloads. When exposing these 
 dictionaries to solr (via https://issues.apache.org/jira/browse/SOLR-5378), 
 it is inconvenient for the user to ensure that a suggestion field is a string 
 field and a payload field is a binary field. It would be nice to have the 
 dictionary just work whenever a string/binary field is passed to 
 suggestion/payload field. The patch provides one solution to this problem (by 
 accepting string or binary values), though it would be great if there are any 
 other solution to this, without making the DocumentDictionary too flexible



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5442) Python client cannot parse proxied response when served by Tomcat.

2013-11-14 Thread Mark Miller (JIRA)
Mark Miller created SOLR-5442:
-

 Summary: Python client cannot parse proxied response when served 
by Tomcat.
 Key: SOLR-5442
 URL: https://issues.apache.org/jira/browse/SOLR-5442
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.7


Seems that propagating the transfer-encoding and connection headers from the 
proxied response to the real response can cause some http clients (python's so 
far) to see chunked encoding data as part of the supposedly non chunked 
response content. The headers are also duplicated in the response.

The headers do not get duplicated with Jetty, and python http libs seem to have 
no problems when getting proxied via Jetty.

Testing seems to confirm that not passing on these headers fixes the Tomcat 
issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Simon Willnauer
Thanks Steve I won't get to this until next week. I will upload a new RC on 
monday.

Simon 

Sent from my iPhone

 On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote:
 
 I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, 
 for all the problems I mentioned.  
 
 The first revision including all these is 1542030.
 
 Steve
 
 On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote:
 
 -1
 
 Smoke tester passes.
 
 Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” 
 follows “Detailed Change List”, but should be above it; and one change 
 attribution didn’t get recognized because it’s missing parens: Elran Dvir 
 via Erick Erickson.  Definitely not worth a respin in either case.
 
 Lucene Changes look good, except that the “API Changes” section in 
 Changes.html is formatted as an item in the “Bug Fixes” section, rather than 
 its own section.  I’ll fix.  (The issue is that “API Changes:” in 
 CHANGES.txt has a trailing colon - the section name regex should allow 
 this.)  This is probably not worth a respin.
 
 Lucene and Solr Documentation pages look good, except that the File 
 Formats” link from the Lucene Documentation page leads to the 4.5 format 
 doc, rather than the 4.6 format doc (Lucene46Codec was introduced by 
 LUCENE-5215).  This is respin-worthy.  Updating this is not automated now - 
 it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t 
 change in every release.  I’ll try to automate extracting the default from 
 o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)].
 
 Lucene and Solr Javadocs look good.
 
 Steve
 
 On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com 
 wrote:
 
 Please vote for the first Release Candidate for Lucene/Solr 4.6.0
 
 you can download it here:
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 
 or run the smoke tester directly with this commandline (don't forget
 to set JAVA6_HOME etc.):
 
 python3.2 -u dev-tools/scripts/smokeTestRelease.py
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6
 
 
 I integrated the RC into Elasticsearch and all tests pass:
 
 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de
 
 Smoketester said: SUCCESS! [1:15:57.339272]
 
 here is my +1
 
 
 Simon
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822840#comment-13822840
 ] 

Robert Muir commented on LUCENE-5339:
-

{quote}
 Really, you're mixing optimizations (inlining dgap+vint) with ease of use. I 
know (!!) that there are apps that can benefit from a different encoding scheme 
(e.g. FourOnesIntEncoder). We don't need to wait until someone comes up w/ a 
better default encoding scheme to introduce abstractions. I mean .. that's just 
sounds crazy to me.
{quote}

How common is this, I'm curious?

Just to lend my opinion/support to this issue: imo the pure number of classes 
to the faceting module can be overwhelming. 

Lets take the encode/decode case: it seems to me you guys iterated a lot and 
figured out vint was the best default encoding. I'm not going to argue that 
some use case could benefit from a custom encoding scheme: instead I'm going to 
argue if it justifies a whole java package with 20 public classes?

So I think its fine to bake in the encoding, but with the two key methods in 
those 20 classes 'protected' in the appropriate places so that an expert user 
could subclass them:

{code}
decode(BytesRef buf, IntsRef values);
encode(IntsRef values, BytesRef buf);
{code}

I'd make the argument: if someone is expert enough to do this, they dont need 
pre-provided concrete encoder/decoder classes anyway, they can write their own 
method?

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Uwe Schindler
The PMC Chair is going to marry tomorrow... Simon has to come here and not do 
new RCs! :)

In any case, thanks for doing the release, Simon. I will do the next!

Uwe



Simon Willnauer simon.willna...@gmail.com schrieb:
Thanks Steve I won't get to this until next week. I will upload a new
RC on monday.

Simon 

Sent from my iPhone

 On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote:
 
 I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and
trunk, for all the problems I mentioned.  
 
 The first revision including all these is 1542030.
 
 Steve
 
 On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote:
 
 -1
 
 Smoke tester passes.
 
 Solr Changes look good, except that the “Upgrading from Solr 4.5.0”
section” follows “Detailed Change List”, but should be above it; and
one change attribution didn’t get recognized because it’s missing
parens: Elran Dvir via Erick Erickson.  Definitely not worth a respin
in either case.
 
 Lucene Changes look good, except that the “API Changes” section in
Changes.html is formatted as an item in the “Bug Fixes” section, rather
than its own section.  I’ll fix.  (The issue is that “API Changes:” in
CHANGES.txt has a trailing colon - the section name regex should allow
this.)  This is probably not worth a respin.
 
 Lucene and Solr Documentation pages look good, except that the File
Formats” link from the Lucene Documentation page leads to the 4.5
format doc, rather than the 4.6 format doc (Lucene46Codec was
introduced by LUCENE-5215).  This is respin-worthy.  Updating this is
not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the
default codec doesn’t change in every release.  I’ll try to automate
extracting the default from o.a.l.codecs.Codec#defaultCodec [ =
Codec.forName(“Lucene46”)].
 
 Lucene and Solr Javadocs look good.
 
 Steve
 
 On Nov 14, 2013, at 4:37 AM, Simon Willnauer
simon.willna...@gmail.com wrote:
 
 Please vote for the first Release Candidate for Lucene/Solr 4.6.0
 
 you can download it here:

http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 
 or run the smoke tester directly with this commandline (don't
forget
 to set JAVA6_HOME etc.):
 
 python3.2 -u dev-tools/scripts/smokeTestRelease.py

http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
 1541686 4.6.0 /tmp/smoke_test_4_6
 
 
 I integrated the RC into Elasticsearch and all tests pass:
 

https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de
 
 Smoketester said: SUCCESS! [1:15:57.339272]
 
 here is my +1
 
 
 Simon
 

-
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

Jenkins build is back to normal : lucene-solr-46-smoker #59

2013-11-14 Thread Charlie Cron
See http://sierranevada.servebeer.com/job/lucene-solr-46-smoker/59/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Dawid Weiss
I think this calls for a *very* special release. :) Everything best
for both of you, Uwe!

Dawid

On Thu, Nov 14, 2013 at 9:11 PM, Uwe Schindler u...@thetaphi.de wrote:
 The PMC Chair is going to marry tomorrow... Simon has to come here and not
 do new RCs! :)

 In any case, thanks for doing the release, Simon. I will do the next!

 Uwe



 Simon Willnauer simon.willna...@gmail.com schrieb:

 Thanks Steve I won't get to this until next week. I will upload a new RC
 on monday.

 Simon

 Sent from my iPhone

  On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote:

  I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and
 trunk, for all the problems I mentioned.

  The first revision including all these is 1542030.

  Steve


  On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote:

  -1

  Smoke tester passes.

  Solr Changes look good, except that the “Upgrading from Solr 4.5.0”
 section” follows “Detailed Change List”, but should be above it; and one
 change attr
  ibution
 didn’t get recognized because it’s missing parens: Elran Dvir via Erick
 Erickson.  Definitely not worth a respin in either case.

  Lucene Changes look good, except that the “API Changes” section in
 Changes.html is formatted as an item in the “Bug Fixes” section, rather 
 than
 its own section.  I’ll fix.  (The issue is that “API Changes:” in
 CHANGES.txt has a trailing colon - the section name regex should allow 
 this.
 )  This is probably not worth a respin.

  Lucene and Solr Documentation pages look good, except that the File
 Formats” link from the Lucene Documentation page leads to the 4.5 format
 doc, rather than the 4.6 format doc (Lucene46Codec was introduced by
 LUCENE-5215).  This is respin-worthy.  Updating this is not automated now -
 it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t
 change in every release.  I’ll try to automate extracting the default from
 o.a.l.codecs.Codec#defaultCodec [ =
 Codec.forName(“Lucene46”)].

  Lucene and Solr Javadocs look good.

  Steve


  On Nov 14, 2013, at 4:37 AM, Simon Willnauer
 simon.willna...@gmail.com wrote:

  Please vote for the first Release Candidate for Lucene/Solr 4.6.0

  you can download it here:

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686

  or run the smoke tester directly with this commandline (don't forget
  to set JAVA6_HOME etc.):

  python3.2 -u dev-tools/scripts/smokeTestRelease.py

 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
  1541686 4.6.0 /tmp/smoke_test_4_6


  I integrated the RC into Elasticsearch and all tests pass:


 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de

  Smoketester said: SUCCESS! [1:15:57.339272]

  here is my +1


  Simon

 

  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org



 

  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org



 

 To unsubscribe,
 e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 --
 Uwe Schindler
 H.-H.-Meier-Allee 63, 28213 Bremen
 http://www.thetaphi.de

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Steve Rowe
Congratulations Uwe!

On Nov 14, 2013, at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote:

 The PMC Chair is going to marry tomorrow... Simon has to come here and not do 
 new RCs! :)
 
 In any case, thanks for doing the release, Simon. I will do the next!
 
 Uwe
 
 
 
 Simon Willnauer simon.willna...@gmail.com schrieb:
 Thanks Steve I won't get to this until next week. I will upload a new RC on 
 monday.
 
 Simon 
 
 Sent from my iPhone
 
  On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote:
  
  I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, 
 for all the problems I mentioned.  
  
  The first revision including all these is 1542030.
  
  Steve
  
  On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote:
  
  -1
  
  Smoke tester passes.
  
  Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” 
 follows “Detailed Change List”, but should be above it; and one change 
 attribution
 didn’t get recognized because it’s missing parens: Elran Dvir via Erick 
 Erickson.  Definitely not worth a respin in either case.
 
  
  Lucene Changes look good, except that the “API Changes” section in 
 Changes.html is formatted as an item in the “Bug Fixes” section, rather than 
 its own section.  I’ll fix.  (The issue is that “API Changes:” in CHANGES.txt 
 has a trailing colon - the section name regex should allow this. )  This is 
 probably not worth a respin.
  
  Lucene and Solr Documentation pages look good, except that the File 
 Formats” link from the Lucene Documentation page leads to the 4.5 format doc, 
 rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). 
  This is respin-worthy.  Updating this is not automated now - it’s hard-coded 
 in lucene/site/xsl/index.xsl - the default codec doesn’t change in every 
 release.  I’ll try to automate extracting the default from 
 o.a.l.codecs.Codec#defaultCodec [ =
 Codec.forName(“Lucene46”)].
 
  
  Lucene and Solr Javadocs look good.
  
  Steve
  
  On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com 
 wrote:
  
  Please vote for the first Release Candidate for Lucene/Solr 4.6.0
  
  you can download it here:
  
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
  
  or run the smoke tester directly with this commandline (don't forget
  to set JAVA6_HOME etc.):
  
  python3.2 -u dev-tools/scripts/smokeTestRelease.py
  
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686
  1541686 4.6.0 /tmp/smoke_test_4_6
  
  
  I integrated the RC into Elasticsearch and all tests pass:
  
  
 https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de
  
  Smoketester said: SUCCESS! [1:15:57.339272]
  
  here is my +1
  
  
  Simon
  
 
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
  
  
 
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
  
 
 
 To unsubscribe,
 e-mail: dev-unsubscr...@lucene.apache.org
 
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 --
 Uwe Schindler
 H.-H.-Meier-Allee 63, 28213 Bremen
 http://www.thetaphi.de


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Michael McCandless
On Thu, Nov 14, 2013 at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote:
 The PMC Chair is going to marry tomorrow... Simon has to come here and not
 do new RCs! :)

Congratulations on tying the knot, Uwe!!

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5217) disable transitive dependencies in maven config

2013-11-14 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-5217.


   Resolution: Fixed
Fix Version/s: (was: 4.6)
   4.7

 disable transitive dependencies in maven config
 ---

 Key: LUCENE-5217
 URL: https://issues.apache.org/jira/browse/LUCENE-5217
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Steve Rowe
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5217.patch, LUCENE-5217.patch, LUCENE-5217.patch, 
 LUCENE-5217.patch


 Our ivy configuration does this: each dependency is specified and so we know 
 what will happen. Unfortunately the maven setup is not configured the same 
 way.
 Instead the maven setup is configured to download the internet: and it 
 excludes certain things specifically.
 This is really hard to configure and maintain: we added a 
 'validate-maven-dependencies' that tries to fail on any extra jars, but all 
 it really does is run a license check after maven runs. It wouldnt find 
 unnecessary dependencies being dragged in if something else in lucene was 
 using them and thus they had a license file.
 Since maven supports wildcard exclusions: MNG-3832, we can disable this 
 transitive shit completely.
 We should do this, so its configuration is the exact parallel of ivy.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822855#comment-13822855
 ] 

ASF subversion and git services commented on LUCENE-5339:
-

Commit 1542062 from [~mikemccand] in branch 'dev/branches/lucene5339'
[ https://svn.apache.org/r1542062 ]

LUCENE-5339: add abstract Facets base class; fix separate test failure

 Simplify the facet module APIs
 --

 Key: LUCENE-5339
 URL: https://issues.apache.org/jira/browse/LUCENE-5339
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5339.patch, LUCENE-5339.patch


 I'd like to explore simplifications to the facet module's APIs: I
 think the current APIs are complex, and the addition of a new feature
 (sparse faceting, LUCENE-5333) threatens to add even more classes
 (e.g., FacetRequestBuilder).  I think we can do better.
 So, I've been prototyping some drastic changes; this is very
 early/exploratory and I'm not sure where it'll wind up but I think the
 new approach shows promise.
 The big changes are:
   * Instead of *FacetRequest/Params/Result, you directly instantiate
 the classes that do facet counting (currently TaxonomyFacetCounts,
 RangeFacetCounts or SortedSetDVFacetCounts), passing in the
 SimpleFacetsCollector, and then you interact with those classes to
 pull labels + values (topN under a path, sparse, specific labels).
   * At index time, no more FacetIndexingParams/CategoryListParams;
 instead, you make a new SimpleFacetFields and pass it the field it
 should store facets + drill downs under.  If you want more than
 one CLI you create more than one instance of SimpleFacetFields.
   * I added a simple schema, where you state which dimensions are
 hierarchical or multi-valued.  From this we decide how to index
 the ordinals (no more OrdinalPolicy).
 Sparse faceting is just another method (getAllDims), on both taxonomy
  ssdv facet classes.
 I haven't created a common base class / interface for all of the
 search-time facet classes, but I think this may be possible/clean, and
 perhaps useful for drill sideways.
 All the new classes are under oal.facet.simple.*.
 Lots of things that don't work yet: drill sideways, complements,
 associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5399) Improve DebugComponent for distributed requests

2013-11-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-5399:


Attachment: SOLR-5399_windows_fix.patch

Stopping Jetty before deleting the SolrHome directory fixes the problem in 
Windows

 Improve DebugComponent for distributed requests
 ---

 Key: SOLR-5399
 URL: https://issues.apache.org/jira/browse/SOLR-5399
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Ryan Ernst
 Fix For: 5.0, 4.7

 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, 
 SOLR-5399_windows_fix.patch


 I'm working on extending the DebugComponent for adding some useful 
 information to be able to track distributed requests better. I'm adding two 
 different things, first, the request can generate a request ID that will be 
 printed in the logs for the main query and all the different internal 
 requests to the different shards. This should make it easier to find the 
 different parts of a single user request in the logs. It would also add the 
 purpose of each internal request to the logs, like: 
 RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. 
 Also, I'm adding a track section to the debug info where to add information 
 about the different phases of the distributed request (right now, I'm only 
 including QTime, but could eventually include more information) like: 
 {code:xml}
 lst name=debug
 lst name=track
 lst name=EXECUTE_QUERY
 str name=localhost:8985/solrQTime: 10/str
 str name=localhost:8984/solrQTime: 25/str
 /lst
 lst name=GET_FIELDS
 str name=localhost:8985/solrQTime: 1/str
 /lst
 /lst
 /lst
 {code}
 To get this, debugQuery must be set to true, or debug must include 
 debug=track. This information is only added to distributed requests.  I 
 would like to get feedback on this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5441) Expose transaction log files number and their size via JMX

2013-11-14 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-5441:
---

Summary: Expose transaction log files number and their size via JMX  (was: 
Expose transaction log files number of their size via JMX)

 Expose transaction log files number and their size via JMX
 --

 Key: SOLR-5441
 URL: https://issues.apache.org/jira/browse/SOLR-5441
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Rafał Kuć
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.6, 5.0

 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch


 It may be useful to have the number of transaction log files and their 
 overall size exposed via JMX for UpdateHandler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

2013-11-14 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822893#comment-13822893
 ] 

Simon Rosenthal commented on SOLR-4722:
---

Just one oddity - there are references to the StoredDocument class in 
getUniqueKeys() which (as far as I can see) is only in trunk - and I'm using 
Lucene/Solr 4.5.1. I replaced that with Document =  which compiles OK, but I 
haven't had a chance to try it out yet. Do you think it should work ?  

-Simon




On Thursday, November 14, 2013 12:21 PM, Tricia Jenkins (JIRA) 
j...@apache.org wrote:
 

    [ 
https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822636#comment-13822636
 ] 

Tricia Jenkins commented on SOLR-4722:
--

Thanks for your interest.  This code/jar could be used as is for your purposes.

If you don't want to specify highlighting enabled in each query just move it to 
conf/solrconfig.xml:
{code:xml}
  requestHandler name=standard class=solr.StandardRequestHandler
    lst name=defaults
      str name=hltrue/str
    /lst
  /requestHandler
{code}

This highlighter only returns the term positions.  The term offsets are stored 
because they're used by the FastVectorHighlighter.  You won't get any useful 
information from this highlighter if you disable termOffsets in your schema.xml.

I just ran this patch against trunk.  Still works!




--
This message was sent by Atlassian JIRA
(v6.1#6144)


 Highlighter which generates a list of query term position(s) for each item in 
 a list of documents, or returns null if highlighting is disabled.
 ---

 Key: SOLR-4722
 URL: https://issues.apache.org/jira/browse/SOLR-4722
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 4.3, 5.0
Reporter: Tricia Jenkins
Priority: Minor
 Attachments: SOLR-4722.patch, solr-positionshighlighter.jar


 As an alternative to returning snippets, this highlighter provides the (term) 
 position for query matches.  One usecase for this is to reconcile the term 
 position from the Solr index with 'word' coordinates provided by an OCR 
 process.  In this way we are able to 'highlight' an image, like a page from a 
 book or an article from a newspaper, in the locations that match the user's 
 query.
 This is based on the FastVectorHighlighter and requires that termVectors, 
 termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5322) Clean up / simplify Maven-related Ant targets

2013-11-14 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-5322.


   Resolution: Fixed
Fix Version/s: (was: 4.6)
   4.7

 Clean up / simplify Maven-related Ant targets
 -

 Key: LUCENE-5322
 URL: https://issues.apache.org/jira/browse/LUCENE-5322
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5322.lucene-javadoc-url.fix.patch, 
 LUCENE-5322.patch, LUCENE-5322.validate-maven-artifacts.patch


 Many Maven-related Ant targets are public when they don't need to be, e.g. 
 dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc.
 The arrangement of these targets could be simplified if the directories that 
 have public entry points were minimized.
 generate-maven-artifacts should be runnable from the top level and from 
 lucene/ and solr/. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #505: POMs out of sync

2013-11-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/505/

4 tests failed.
FAILED:  
org.apache.solr.handler.dataimport.TestContentStreamDataSource.org.apache.solr.handler.dataimport.TestContentStreamDataSource

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.handler.dataimport.TestContentStreamDataSource: 
   1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, 
group=TGRP-TestContentStreamDataSource]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:189)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.handler.dataimport.TestContentStreamDataSource: 
   1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, 
group=TGRP-TestContentStreamDataSource]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:189)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
at __randomizedtesting.SeedInfo.seed([701D3621011975D9]:0)


FAILED:  
org.apache.solr.handler.dataimport.TestContentStreamDataSource.org.apache.solr.handler.dataimport.TestContentStreamDataSource

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, 
group=TGRP-TestContentStreamDataSource]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:189)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie 
threads that couldn't be terminated:
   1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, 
group=TGRP-TestContentStreamDataSource]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:189)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
at __randomizedtesting.SeedInfo.seed([701D3621011975D9]:0)


FAILED:  

Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Mark Miller

On Nov 14, 2013, at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote:

 The PMC Chair is going to marry tomorrow... Simon has to come here and not do 
 new RCs! :)

+1 :)

- Mark

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-11-14 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822917#comment-13822917
 ] 

Stefan Matheis (steffkes) commented on SOLR-5287:
-

so .. i got it, hopefully. what i'd say we do (in that separate ticket, as you 
mentioned) is:

add a new page called Files (or something like that) which starts with a 
typical file-tree, as we have it in the Cloud-Section already .. which 
enables you to browse directories  files and view their contents.

right now this patch only allows file-uploads (or at least i didn't manage it 
to accept raw text which i posted to this endpoint)? the code is using input 
streams .. no idea if that is fileupload-specific?

because if we could post the content of a file .. we could offer two choices: 

# upload a complete file, you have on your disk
# change into an edit mode .. and then post the changed file from within your 
browser

which would basically mean you could modify your schema w/o the need to 
download, modify  re-upload it.

that would be like we have it already on the Data Import Page .. where you 
could send a {{dataConfig}} parameter, which then is used instead of the 
persisted configuration (related code is in the 
[DataImportHandler.java|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java?view=markup#l129])



 Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
 --

 Key: SOLR-5287
 URL: https://issues.apache.org/jira/browse/SOLR-5287
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5287.patch, SOLR-5287.patch, SOLR-5287.patch


 A user asking a question on the Solr list got me to thinking about editing 
 the main config files from the Solr admin screen. I chatted briefly with 
 [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
 problem on that end. His comment is there's no end point that'll write the 
 file back.
 Am I missing something here or is this actually not a hard problem? I see a 
 couple of issues off the bat, neither of which seem troublesome.
 1 file permissions. I'd imagine lots of installations will get file 
 permission exceptions if Solr tries to write the file out. Well, do a 
 chmod/chown.
 2 screwing up the system maliciously or not. I don't think this is an issue, 
 this would be part of the admin handler after all.
 Does anyone have objections to the idea? And how does this fit into the work 
 that [~sar...@syr.edu] has been doing?
 I can imagine this extending to SolrCloud with a push this to ZK option or 
 something like that, perhaps not in V1 unless it's easy.
 Of course any pointers gratefully received. Especially ones that start with 
 Don't waste your effort, it'll never work (or be accepted)...
 Because what scares me is this seems like such an easy thing to do that would 
 be a significant ease-of-use improvement, so there _has_ to be something I'm 
 missing.
 So if we go forward with this we'll make this the umbrella JIRA, the two 
 immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
 for the UI work to use.
 I think there are only two end-points here
 1 list all the files in the conf (or arbitrary from solr_home/collection) 
 directory.
 2 write this text to this file
 Possibly later we could add clone the configs from coreX to coreY.
 BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
 take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Lucene / Solr 4.6.0

2013-11-14 Thread Yonik Seeley
On Thu, Nov 14, 2013 at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote:
 The PMC Chair is going to marry tomorrow...

Congrats Uwe!

-Yonik
http://heliosearch.com -- making solr shine

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests

2013-11-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822920#comment-13822920
 ] 

Robert Muir commented on SOLR-5399:
---

Thanks Tomas: I committed this.

 Improve DebugComponent for distributed requests
 ---

 Key: SOLR-5399
 URL: https://issues.apache.org/jira/browse/SOLR-5399
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Ryan Ernst
 Fix For: 5.0, 4.7

 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, 
 SOLR-5399_windows_fix.patch


 I'm working on extending the DebugComponent for adding some useful 
 information to be able to track distributed requests better. I'm adding two 
 different things, first, the request can generate a request ID that will be 
 printed in the logs for the main query and all the different internal 
 requests to the different shards. This should make it easier to find the 
 different parts of a single user request in the logs. It would also add the 
 purpose of each internal request to the logs, like: 
 RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. 
 Also, I'm adding a track section to the debug info where to add information 
 about the different phases of the distributed request (right now, I'm only 
 including QTime, but could eventually include more information) like: 
 {code:xml}
 lst name=debug
 lst name=track
 lst name=EXECUTE_QUERY
 str name=localhost:8985/solrQTime: 10/str
 str name=localhost:8984/solrQTime: 25/str
 /lst
 lst name=GET_FIELDS
 str name=localhost:8985/solrQTime: 1/str
 /lst
 /lst
 /lst
 {code}
 To get this, debugQuery must be set to true, or debug must include 
 debug=track. This information is only added to distributed requests.  I 
 would like to get feedback on this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822921#comment-13822921
 ] 

ASF subversion and git services commented on SOLR-5399:
---

Commit 1542082 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1542082 ]

SOLR-5399: fix windows test issue

 Improve DebugComponent for distributed requests
 ---

 Key: SOLR-5399
 URL: https://issues.apache.org/jira/browse/SOLR-5399
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Ryan Ernst
 Fix For: 5.0, 4.7

 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, 
 SOLR-5399_windows_fix.patch


 I'm working on extending the DebugComponent for adding some useful 
 information to be able to track distributed requests better. I'm adding two 
 different things, first, the request can generate a request ID that will be 
 printed in the logs for the main query and all the different internal 
 requests to the different shards. This should make it easier to find the 
 different parts of a single user request in the logs. It would also add the 
 purpose of each internal request to the logs, like: 
 RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. 
 Also, I'm adding a track section to the debug info where to add information 
 about the different phases of the distributed request (right now, I'm only 
 including QTime, but could eventually include more information) like: 
 {code:xml}
 lst name=debug
 lst name=track
 lst name=EXECUTE_QUERY
 str name=localhost:8985/solrQTime: 10/str
 str name=localhost:8984/solrQTime: 25/str
 /lst
 lst name=GET_FIELDS
 str name=localhost:8985/solrQTime: 1/str
 /lst
 /lst
 /lst
 {code}
 To get this, debugQuery must be set to true, or debug must include 
 debug=track. This information is only added to distributed requests.  I 
 would like to get feedback on this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests

2013-11-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822919#comment-13822919
 ] 

ASF subversion and git services commented on SOLR-5399:
---

Commit 1542080 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1542080 ]

SOLR-5399: fix windows test issue

 Improve DebugComponent for distributed requests
 ---

 Key: SOLR-5399
 URL: https://issues.apache.org/jira/browse/SOLR-5399
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Ryan Ernst
 Fix For: 5.0, 4.7

 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, 
 SOLR-5399_windows_fix.patch


 I'm working on extending the DebugComponent for adding some useful 
 information to be able to track distributed requests better. I'm adding two 
 different things, first, the request can generate a request ID that will be 
 printed in the logs for the main query and all the different internal 
 requests to the different shards. This should make it easier to find the 
 different parts of a single user request in the logs. It would also add the 
 purpose of each internal request to the logs, like: 
 RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. 
 Also, I'm adding a track section to the debug info where to add information 
 about the different phases of the distributed request (right now, I'm only 
 including QTime, but could eventually include more information) like: 
 {code:xml}
 lst name=debug
 lst name=track
 lst name=EXECUTE_QUERY
 str name=localhost:8985/solrQTime: 10/str
 str name=localhost:8984/solrQTime: 25/str
 /lst
 lst name=GET_FIELDS
 str name=localhost:8985/solrQTime: 1/str
 /lst
 /lst
 /lst
 {code}
 To get this, debugQuery must be set to true, or debug must include 
 debug=track. This information is only added to distributed requests.  I 
 would like to get feedback on this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-11-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822925#comment-13822925
 ] 

Erick Erickson commented on SOLR-5287:
--

It's working for me by specifying a request parameter 'stream.body=put your 
text here', even from within some tests I'm writing. Does that work for you? I 
freely admit this is somewhat of a mystery to me.

 Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
 --

 Key: SOLR-5287
 URL: https://issues.apache.org/jira/browse/SOLR-5287
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5287.patch, SOLR-5287.patch, SOLR-5287.patch


 A user asking a question on the Solr list got me to thinking about editing 
 the main config files from the Solr admin screen. I chatted briefly with 
 [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
 problem on that end. His comment is there's no end point that'll write the 
 file back.
 Am I missing something here or is this actually not a hard problem? I see a 
 couple of issues off the bat, neither of which seem troublesome.
 1 file permissions. I'd imagine lots of installations will get file 
 permission exceptions if Solr tries to write the file out. Well, do a 
 chmod/chown.
 2 screwing up the system maliciously or not. I don't think this is an issue, 
 this would be part of the admin handler after all.
 Does anyone have objections to the idea? And how does this fit into the work 
 that [~sar...@syr.edu] has been doing?
 I can imagine this extending to SolrCloud with a push this to ZK option or 
 something like that, perhaps not in V1 unless it's easy.
 Of course any pointers gratefully received. Especially ones that start with 
 Don't waste your effort, it'll never work (or be accepted)...
 Because what scares me is this seems like such an easy thing to do that would 
 be a significant ease-of-use improvement, so there _has_ to be something I'm 
 missing.
 So if we go forward with this we'll make this the umbrella JIRA, the two 
 immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
 for the UI work to use.
 I think there are only two end-points here
 1 list all the files in the conf (or arbitrary from solr_home/collection) 
 directory.
 2 write this text to this file
 Possibly later we could add clone the configs from coreX to coreY.
 BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
 take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >