[jira] [Updated] (SOLR-5438) DebugComponent throws NPE when used with grouping
[ https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5438: Attachment: SOLR-5438.patch Patch updated to trunk. DebugComponent throws NPE when used with grouping - Key: SOLR-5438 URL: https://issues.apache.org/jira/browse/SOLR-5438 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Assignee: Shalin Shekhar Mangar Attachments: SOLR-5438.patch, SOLR-5438.patch To Reproduce: start trunk example Run query: http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr DebugComponent throws a NPE like: {noformat} 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) {noformat} Seems like some internal requests when using grouping don't populate ResponseBuilder.results. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5438) DebugComponent throws NPE when used with grouping
[ https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822252#comment-13822252 ] ASF subversion and git services commented on SOLR-5438: --- Commit 1541849 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1541849 ] SOLR-5438: DebugComponent throws NPE when used with grouping DebugComponent throws NPE when used with grouping - Key: SOLR-5438 URL: https://issues.apache.org/jira/browse/SOLR-5438 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Assignee: Shalin Shekhar Mangar Attachments: SOLR-5438.patch, SOLR-5438.patch To Reproduce: start trunk example Run query: http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr DebugComponent throws a NPE like: {noformat} 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) {noformat} Seems like some internal requests when using grouping don't populate ResponseBuilder.results. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5438) DebugComponent throws NPE when used with grouping
[ https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822256#comment-13822256 ] ASF subversion and git services commented on SOLR-5438: --- Commit 1541853 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1541853 ] SOLR-5438: DebugComponent throws NPE when used with grouping DebugComponent throws NPE when used with grouping - Key: SOLR-5438 URL: https://issues.apache.org/jira/browse/SOLR-5438 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Assignee: Shalin Shekhar Mangar Attachments: SOLR-5438.patch, SOLR-5438.patch To Reproduce: start trunk example Run query: http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr DebugComponent throws a NPE like: {noformat} 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) {noformat} Seems like some internal requests when using grouping don't populate ResponseBuilder.results. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5438) DebugComponent throws NPE when used with grouping
[ https://issues.apache.org/jira/browse/SOLR-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-5438. - Resolution: Fixed Fix Version/s: 4.7 5.0 This is fixed. Thanks Tomás! DebugComponent throws NPE when used with grouping - Key: SOLR-5438 URL: https://issues.apache.org/jira/browse/SOLR-5438 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Assignee: Shalin Shekhar Mangar Fix For: 5.0, 4.7 Attachments: SOLR-5438.patch, SOLR-5438.patch To Reproduce: start trunk example Run query: http://localhost:8983/solr/select?q=testdebug=truegroup=truegroup.field=inStockdistrib=trueshards=localhost:8983/solr,localhost:8983/solr DebugComponent throws a NPE like: {noformat} 83841 [qtp1070887245-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) {noformat} Seems like some internal requests when using grouping don't populate ResponseBuilder.results. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822262#comment-13822262 ] Shalin Shekhar Mangar commented on SOLR-5399: - Looks like this is causing a transaction log leak on Windows. http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3386/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3463/ Improve DebugComponent for distributed requests --- Key: SOLR-5399 URL: https://issues.apache.org/jira/browse/SOLR-5399 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Assignee: Ryan Ernst Fix For: 5.0, 4.7 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch I'm working on extending the DebugComponent for adding some useful information to be able to track distributed requests better. I'm adding two different things, first, the request can generate a request ID that will be printed in the logs for the main query and all the different internal requests to the different shards. This should make it easier to find the different parts of a single user request in the logs. It would also add the purpose of each internal request to the logs, like: RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. Also, I'm adding a track section to the debug info where to add information about the different phases of the distributed request (right now, I'm only including QTime, but could eventually include more information) like: {code:xml} lst name=debug lst name=track lst name=EXECUTE_QUERY str name=localhost:8985/solrQTime: 10/str str name=localhost:8984/solrQTime: 25/str /lst lst name=GET_FIELDS str name=localhost:8985/solrQTime: 1/str /lst /lst /lst {code} To get this, debugQuery must be set to true, or debug must include debug=track. This information is only added to distributed requests. I would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5440) UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates
Chris created SOLR-5440: --- Summary: UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates Key: SOLR-5440 URL: https://issues.apache.org/jira/browse/SOLR-5440 Project: Solr Issue Type: Bug Reporter: Chris This is a pretty nasty bug, and causes the cluster to stop accepting updates. I'm not sure how to consistently reproduce it but I have done so numerous times. Switching to a whitespace tokenizer improved indexing speed, and I never got the issue again. When the thread hits this bug it uses 100% CPU, restarting the node which has the error allows indexing to continue until hit again. Here is thread dump: http-bio-8080-exec-45 (201) org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken(UAX29URLEmailTokenizerImpl.java:4343) org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken(UAX29URLEmailTokenizer.java:147) org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82) org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1517) org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217) org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:583) org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:719) org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:449) org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
Re: Build failed in Jenkins: lucene-solr-46-smoker #5
yes that is what I was using. the odd part is that I rebuild the RC and wiped my m2 repo for the old release and now it doesn't run OOM anymore. Oddness at it's best. It also only happened on my mac but not on the linux box. I don't know what's going on but it seems to be sorted out. simon On Wed, Nov 13, 2013 at 9:42 PM, Steve Rowe sar...@gmail.com wrote: Simon, Are you using the maven-compiler-plugin's “maxmem” configuration?: http://maven.apache.org/plugins/maven-compiler-plugin/compile-mojo.html#maxmem. I’ve never had to use this before, but you should also be able to try it on the command line using -Dmaven.compiler.maxmem=512m (or whatever size is required). Steve On Nov 13, 2013, at 3:05 PM, Simon Willnauer simon.willna...@gmail.com wrote: hmm this is odd. I really don't get why I can't compile ES anymore then. I was running the compile with default values not sure how much it is but now I can't complie it anymore with 4.6. I am not sure what is going on here but it seems the are pretty much broken? I will build an RC and we can try it then again! On Wed, Nov 13, 2013 at 8:33 PM, Steve Rowe sar...@gmail.com wrote: Simon, As I mentioned in the 4.6 release thread, I backported LUCENE-5217 and LUCENE-5322 to branch_4x *after* you made the 4.6 release branch, so the LUCENE-5322-related things I’m committing to trunk and branch_4x should not affect the 4.6 release branch. So the 4.6 POMs should be very much like the 4.5.1 POMs. 5GB??? How much did you need for 4.5.1? Steve On Nov 13, 2013, at 2:24 PM, Simon Willnauer simon.willna...@gmail.com wrote: I am just reading through the thread though. I had some problems with java 1.7 vs. 1.6 so smoketest failed but the release was still testable so I integarted it into elasticsearch. Yet with that artifact I build I can't even compile ES since the maven compiler target runs out of memory even if I give it 5GB which with 4.5.1 it just ran fine. So we have a problem here that I can't really figure out though. This maven thing seems to be a black box though. @sarowe I see you are fixing something related to maven but is this actually related? if yes you need to backport it to 4.x and the release branch please. simon On Wed, Nov 13, 2013 at 8:15 PM, Robert Muir rcm...@gmail.com wrote: Thanks Uwe... so JAVA_HOME seems to be working fine. This doesnt explain why simon had problems, but at least it confirms its working: maybe he had a configuration issue. On Wed, Nov 13, 2013 at 2:10 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Robert, The Jenkins config was wrong. Your Jenkins instance runs with Java 7, so by default it starts jobs also with Java 7. You only configured JAVA_HOME as a sysprop for ANT, so it got passed with -DJAVA_HOME. The right config is to define available Java installations via the Admin UI and select the right one (not Default) in the job config. After that, Jenkins sets JAVA_HOME to the right one before launching ANT. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, November 13, 2013 7:00 PM To: dev@lucene.apache.org Subject: Re: Build failed in Jenkins: lucene-solr-46-smoker #5 This looks like a real bug in the build? Ive got JAVA_HOME set to a 1.6 compiler... On Wed, Nov 13, 2013 at 12:56 PM, Charlie Cron hudsonsevilt...@gmail.com wrote: See http://sierranevada.servebeer.com/job/lucene-solr-46-smoker/5/ -- [...truncated 52012 lines...] jar-grouping: check-queries-uptodate: jar-queries: check-queryparser-uptodate: jar-queryparser: check-join-uptodate: jar-join: prep-lucene-jars: resolve-example: ivy-availability-check: [echo] Building solr-solrj... ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = http://sierranevada.servebeer.com/job/lucene-solr-46- smoker/ws/lucene /ivy-settings.xml resolve: common.init: compile-lucene-core: init: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: common.compile-core: compile-core: resolve-groovy: define-lucene-javadoc-url: javadocs: [echo] Building solr-solrj... download-java6-javadoc-packagelist: [delete] Deleting: http://sierranevada.servebeer.com/job/lucene-solr- 46-smoker/ws/solr/build/docs/solr-solrj/stylesheet.css [javadoc] Generating Javadoc [javadoc] Javadoc execution [javadoc] Loading source files for package org.apache.solr.client.solrj... [javadoc] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javadoc] Loading source files for package org.apache.solr.client.solrj.beans... [javadoc] Loading source files for package org.apache.solr.client.solrj.impl... [javadoc] Loading source files for package org.apache.solr.client.solrj.request... [javadoc]
[jira] [Commented] (SOLR-5440) UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates
[ https://issues.apache.org/jira/browse/SOLR-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822268#comment-13822268 ] Chris commented on SOLR-5440: - Googling I found someone hit the same issue with elasticsearch, https://gist.github.com/jeremy/2925923 UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates Key: SOLR-5440 URL: https://issues.apache.org/jira/browse/SOLR-5440 Project: Solr Issue Type: Bug Reporter: Chris This is a pretty nasty bug, and causes the cluster to stop accepting updates. I'm not sure how to consistently reproduce it but I have done so numerous times. Switching to a whitespace tokenizer improved indexing speed, and I never got the issue again. When the thread hits this bug it uses 100% CPU, restarting the node which has the error allows indexing to continue until hit again. Here is thread dump: http-bio-8080-exec-45 (201) org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken(UAX29URLEmailTokenizerImpl.java:4343) org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken(UAX29URLEmailTokenizer.java:147) org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82) org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1517) org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217) org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:583) org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:719) org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:449) org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822273#comment-13822273 ] Shai Erera commented on LUCENE-5339: About FacetIndexWriter, it will need to take an optional TaxonomyWriter (i.e. if you intend to use TaxonomyFacetField). But then I wonder if users won't expect that FacetIW.commit() won't commit both the underlying IW and TW. Actually, this could be a very good thing to do, since we could control the order those two objects are committed, and do the two-phase commit properly, rather than telling users what to do. But that means we'd need to make IW.commit() not final. Besides the advantage of doing the commit right, I worry that if we don't do that, users will be confused about having to call TW.commit() themselves, just because now FacetIW already has a handle to their TW. What do you think? We could also just add a commitTaxoAndIndex() method, but that's less elegant. Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5440) UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop accepting updates
[ https://issues.apache.org/jira/browse/SOLR-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris updated SOLR-5440: Description: This is a pretty nasty bug, and causes the cluster to stop accepting updates. I'm not sure how to consistently reproduce it but I have done so numerous times. Switching to a whitespace tokenizer improved indexing speed, and I never got the issue again. I'm running a 4.6 Snapshot - I had issues with deadlocks with numerous versions of Solr, and have finally narrowed down the problem to this code, which affects many/all(?) versions of Solr. When the thread hits this issue it uses 100% CPU, restarting the node which has the error allows indexing to continue until hit again. Here is thread dump: http-bio-8080-exec-45 (201) org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken(UAX29URLEmailTokenizerImpl.java:4343) org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken(UAX29URLEmailTokenizer.java:147) org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82) org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248) org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253) org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453) org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1517) org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217) org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:583) org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:719) org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:449) org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
[jira] [Updated] (LUCENE-5329) Make DocumentDictionary and co more lenient to dirty documents
[ https://issues.apache.org/jira/browse/LUCENE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5329: - Attachment: LUCENE-5329.patch Patch Updated: - Added ctor for DocumentExpressionDictionary (can take in ValueSource) [wondering if the name should be more general, as it can now compute weights using ValueSource directly] - Allow DocumentDictionary to use NumericDocValuesField for suggestion weights - Updated tests to reflect new changes NOTE: using ant documenation-lint gives me this error (any advice on fixing this javadoc is greatly appreciated): [exec] file:///build/docs/suggest/org/apache/lucene/search/suggest/DocumentExpressionDictionary.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html [exec] BROKEN LINK: file:///build/docs/core/org/apache/lucene/queries.function.ValueSource.html Make DocumentDictionary and co more lenient to dirty documents -- Key: LUCENE-5329 URL: https://issues.apache.org/jira/browse/LUCENE-5329 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Areek Zillur Attachments: LUCENE-5329.patch, LUCENE-5329.patch Currently DocumentDictionary errors out whenever any document does not have value for any relevant stored fields. It would be nice to make it lenient and instead ignore the invalid documents. Another issue with the DocumentDictionary is that it only allows string fields as suggestions and binary fields as payloads. When exposing these dictionaries to solr (via https://issues.apache.org/jira/browse/SOLR-5378), it is inconvenient for the user to ensure that a suggestion field is a string field and a payload field is a binary field. It would be nice to have the dictionary just work whenever a string/binary field is passed to suggestion/payload field. The patch provides one solution to this problem (by accepting string or binary values), though it would be great if there are any other solution to this, without making the DocumentDictionary too flexible -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[VOTE] Lucene / Solr 4.6.0
Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2724) Deprecate defaultSearchField and defaultOperator defined in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822304#comment-13822304 ] Anca Kopetz commented on SOLR-2724: --- Hi, I do not know if my comment is useful, but I found some code that still uses the defaultOperator from schema.xml in solr-core-4.5.1 {code:title=DisMaxQParser.java|borderStyle=solid} public static String parseMinShouldMatch(final IndexSchema schema, final SolrParams params) { org.apache.solr.parser.QueryParser.Operator op = QueryParsing.getQueryParserDefaultOperator (schema, params.get(QueryParsing.OP)); return params.get(DisMaxParams.MM, op.equals(QueryParser.Operator.AND) ? 100% : 0%); } {code} {code:title=QueryParsing.java|borderStyle=solid} public static QueryParser.Operator getQueryParserDefaultOperator(final IndexSchema sch, final String override) { String val = override; if (null == val) val = sch.getQueryParserDefaultOperator(); return AND.equals(val) ? QueryParser.Operator.AND : QueryParser.Operator.OR; } {code} Deprecate defaultSearchField and defaultOperator defined in schema.xml -- Key: SOLR-2724 URL: https://issues.apache.org/jira/browse/SOLR-2724 Project: Solr Issue Type: Improvement Components: Schema and Analysis, search Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 3.6, 4.0-ALPHA Attachments: SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch, SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch Original Estimate: 2h Remaining Estimate: 2h I've always been surprised to see the defaultSearchField element and solrQueryParser defaultOperator=OR/ defined in the schema.xml file since the first time I saw them. They just seem out of place to me since they are more query parser related than schema related. But not only are they misplaced, I feel they shouldn't exist. For query parsers, we already have a df parameter that works just fine, and explicit field references. And the default lucene query operator should stay at OR -- if a particular query wants different behavior then use q.op or simply use OR. similarity Seems like something better placed in solrconfig.xml than in the schema. In my opinion, defaultSearchField and defaultOperator configuration elements should be deprecated in Solr 3.x and removed in Solr 4. And similarity should move to solrconfig.xml. I am willing to do it, provided there is consensus on it of course. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_45) - Build # 3464 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3464/ Java: 64bit/jdk1.7.0_45 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.component.DistributedDebugComponentTest Error Message: Unable to delete file: .\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000 Stack Trace: java.io.IOException: Unable to delete file: .\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000 at __randomizedtesting.SeedInfo.seed([3BB258D4BFF7C45F]:0) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1919) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.solr.SolrJettyTestBase.cleanUpJettyHome(SolrJettyTestBase.java:189) at org.apache.solr.handler.component.DistributedDebugComponentTest.afterTest(DistributedDebugComponentTest.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:700) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:744) Build Log: [...truncated 10822 lines...] [junit4] Suite: org.apache.solr.handler.component.DistributedDebugComponentTest [junit4] 2 3321829 T9083 oas.SolrTestCaseJ4.initCore initCore [junit4] 2 Creating dataDir: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\.\solrtest-DistributedDebugComponentTest-1384429229063 [junit4] 2 3321833 T9083 oas.SolrTestCaseJ4.initCore initCore end [junit4] 2 3321833 T9083 oas.SolrJettyTestBase.getSSLConfig Randomized ssl (false) and clientAuth (false) [junit4] 2 3321833 T9083 oejs.Server.doStart jetty-8.1.10.v20130312 [junit4] 2 3321843 T9083 oejs.AbstractConnector.doStart Started SelectChannelConnector@127.0.0.1:55199 [junit4] 2 3321843 T9083 oass.SolrDispatchFilter.init SolrDispatchFilter.init() [junit4] 2 3321844 T9083 oasc.SolrResourceLoader.locateSolrHome JNDI not configured
[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.8.0-ea-b114) - Build # 3387 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/3387/ Java: 32bit/jdk1.8.0-ea-b114 -client -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test Error Message: unreferenced files: before delete: [_2_Pulsing41_0.doc, _2_Pulsing41_0.pos, _5.cfe, _5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, _r.cfs, _r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, _u.fnm, _u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, _u_Lucene41WithOrds_0.pos, _u_Lucene41WithOrds_0.tib, _u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, _u_MockVariableIntBlock_0.doc, _u_MockVariableIntBlock_0.frq, _u_MockVariableIntBlock_0.pos, _u_MockVariableIntBlock_0.pyl, _u_MockVariableIntBlock_0.skp, _u_MockVariableIntBlock_0.tib, _u_MockVariableIntBlock_0.tii, _u_Pulsing41_0.doc, _u_Pulsing41_0.pos, _u_Pulsing41_0.tim, _u_Pulsing41_0.tip, _u_SimpleText_0.dat, segments.gen, segments_o] after delete: [_5.cfe, _5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, _r.cfs, _r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, _u.fnm, _u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, _u_Lucene41WithOrds_0.pos, _u_Lucene41WithOrds_0.tib, _u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, _u_MockVariableIntBlock_0.doc, _u_MockVariableIntBlock_0.frq, _u_MockVariableIntBlock_0.pos, _u_MockVariableIntBlock_0.pyl, _u_MockVariableIntBlock_0.skp, _u_MockVariableIntBlock_0.tib, _u_MockVariableIntBlock_0.tii, _u_Pulsing41_0.doc, _u_Pulsing41_0.pos, _u_Pulsing41_0.tim, _u_Pulsing41_0.tip, _u_SimpleText_0.dat, segments.gen, segments_o] These files were removed: [_2_Pulsing41_0.doc, _2_Pulsing41_0.pos] These files we had previously tried to delete, but couldn't: [_0_MockVariableIntBlock_0.pyl, _2_MockVariableIntBlock_0.frq, _2.fdt, _0.nvd, _0_Pulsing41_0.doc, _2_SimpleText_0.dat, _0_MockVariableIntBlock_0.doc, _2_Lucene41WithOrds_0.tib, _1.cfs, _2.nvd, _2_MockVariableIntBlock_0.skp, _2.tvd, _0_Pulsing41_0.pos, _2_Lucene41WithOrds_0.pos, _2_MockVariableIntBlock_0.pos, _2_MockVariableIntBlock_0.tib, _0_SimpleText_0.dat, _0_Lucene41WithOrds_0.pos, _0_MockVariableIntBlock_0.frq, _2_MockVariableIntBlock_0.pyl, _0_Lucene41WithOrds_0.tib, _0_Pulsing41_0.tim, _0.fdt, _0_MockVariableIntBlock_0.skp, _0.tvd, _2_MockVariableIntBlock_0.doc, _0_MockVariableIntBlock_0.pos, _2_Lucene41WithOrds_0.doc, _2_Pulsing41_0.tim, _0_Lucene41WithOrds_0.doc, _0_MockVariableIntBlock_0.tib] Stack Trace: java.lang.AssertionError: unreferenced files: before delete: [_2_Pulsing41_0.doc, _2_Pulsing41_0.pos, _5.cfe, _5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, _r.cfs, _r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, _u.fnm, _u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, _u_Lucene41WithOrds_0.pos, _u_Lucene41WithOrds_0.tib, _u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, _u_MockVariableIntBlock_0.doc, _u_MockVariableIntBlock_0.frq, _u_MockVariableIntBlock_0.pos, _u_MockVariableIntBlock_0.pyl, _u_MockVariableIntBlock_0.skp, _u_MockVariableIntBlock_0.tib, _u_MockVariableIntBlock_0.tii, _u_Pulsing41_0.doc, _u_Pulsing41_0.pos, _u_Pulsing41_0.tim, _u_Pulsing41_0.tip, _u_SimpleText_0.dat, segments.gen, segments_o] after delete: [_5.cfe, _5.cfs, _5.si, _i.cfe, _i.cfs, _i.si, _q.cfe, _q.cfs, _q.si, _r.cfe, _r.cfs, _r.si, _s.cfe, _s.cfs, _s.si, _t.cfe, _t.cfs, _t.si, _u.fdt, _u.fdx, _u.fnm, _u.nvd, _u.nvm, _u.si, _u.tvd, _u.tvx, _u_Lucene41WithOrds_0.doc, _u_Lucene41WithOrds_0.pos, _u_Lucene41WithOrds_0.tib, _u_Lucene41WithOrds_0.tii, _u_Memory_0.ram, _u_MockVariableIntBlock_0.doc, _u_MockVariableIntBlock_0.frq, _u_MockVariableIntBlock_0.pos, _u_MockVariableIntBlock_0.pyl, _u_MockVariableIntBlock_0.skp, _u_MockVariableIntBlock_0.tib, _u_MockVariableIntBlock_0.tii, _u_Pulsing41_0.doc, _u_Pulsing41_0.pos, _u_Pulsing41_0.tim, _u_Pulsing41_0.tip, _u_SimpleText_0.dat, segments.gen, segments_o] These files were removed: [_2_Pulsing41_0.doc, _2_Pulsing41_0.pos] These files we had previously tried to delete, but couldn't: [_0_MockVariableIntBlock_0.pyl, _2_MockVariableIntBlock_0.frq, _2.fdt, _0.nvd, _0_Pulsing41_0.doc, _2_SimpleText_0.dat, _0_MockVariableIntBlock_0.doc, _2_Lucene41WithOrds_0.tib, _1.cfs, _2.nvd, _2_MockVariableIntBlock_0.skp, _2.tvd, _0_Pulsing41_0.pos, _2_Lucene41WithOrds_0.pos, _2_MockVariableIntBlock_0.pos, _2_MockVariableIntBlock_0.tib, _0_SimpleText_0.dat, _0_Lucene41WithOrds_0.pos, _0_MockVariableIntBlock_0.frq, _2_MockVariableIntBlock_0.pyl, _0_Lucene41WithOrds_0.tib, _0_Pulsing41_0.tim, _0.fdt, _0_MockVariableIntBlock_0.skp, _0.tvd, _2_MockVariableIntBlock_0.doc, _0_MockVariableIntBlock_0.pos, _2_Lucene41WithOrds_0.doc, _2_Pulsing41_0.tim, _0_Lucene41WithOrds_0.doc, _0_MockVariableIntBlock_0.tib] at __randomizedtesting.SeedInfo.seed([28147EEFD1229E3:8AD5783453EE441B]:0)
[jira] [Updated] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5339: --- Attachment: LUCENE-5339.patch Thanks for the feedback everyone ... I'm attaching a new patch. Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822361#comment-13822361 ] Michael McCandless commented on LUCENE-5339: {quote} Facet's Accumulator is similar to Lucene's Collector, the Aggregator is sort of a Scorer, and a FacetRequest is a sort of Query. Actually the model after which the facets were designed was Lucene's. The optional IndexingParams came before the IndexWriterConfig but these can be said to be similar as well. {quote} I appreciate those analogies but I think the two cases are very different: I think faceting is (ought to be) far simpler than searching. bq. More low-level objects such as the CategoryListParams are not a must, and the user may never know about them (and btw, they are similar to Codecs). Likewise, I don't think we need to expose codec like control / pluggability over facet ords encoding at this point. bq. I reviewed the patch (mostly the taxonomy related part) and I think that even without associations, counts only is a bit narrow. I added ValueSource aggregation in the next patch, but not associations; I think associations can come later (it's just another index time and search time impl). {quote} Specially with large counts (say many thousands) the count doesn't say much because of the long tail problem. When there's a large result set, all the categories will get high hit counts. And just as scoring by counting the number of query terms each document matches doesn't always make much sense (and I think all scoring functions do things a lot smarter), using counts for facets may at times yield irrelevant results. We found out that for large result sets, an aggregation of Lucene's score (rather than +1), or even score^2 yields better results for the user. Also arbitrary expressions which are corpus specific (with or without associations) changes the facets' usability dramatically. That's partially why the code was built to allow different aggregation techniques, allowing associations, numeric values etc into each value for each category. {quote} I agree. Do you think ValueSource faceting is sufficient for such apps? Or do they typically use associations? Aren't associations only really required in the multi-valued facet field case? bq. As for the new API, it may be useful if there would be a single interface - so all facets implementations could be switched easily, allowing users to experiment with the different implementations without writing a lot of code. Yeah I think so too ... it's on the TODO list. Especially, if the FacetsConfig knows the facet method used by a given field, then we could (almost) produce the right impl at search time. Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822362#comment-13822362 ] Michael McCandless commented on LUCENE-5339: bq. Can we rename CategoryPath to FacetLabel or something more intuitive? +1 for FacetLabel; I put a nocommit. But, the new patch actually nearly eliminates the need to create CategoryPath (it's still needed to create a DrillDownQuery but I dropped a nocommit to see if we can fix that). bq. new LongRange(less than 10, 0L, true, 10L, false) -- can we make it so this is less arguments? Not sure exactly how :) bq. What if it worked like this: This is an awesome idea! I did that in the new patch; now indexing is really simple, e.g.: {code} doc = new Document(); doc.add(new FacetField(Author, Frank)); doc.add(new FacetField(Publish Date, 1999, 5, 5)); {code} and: {code} doc = new Document(); doc.add(new SortedSetDocValuesFacetField(a, bar)); {code} Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822360#comment-13822360 ] Michael McCandless commented on LUCENE-5339: bq. You have a TODO file which seems to have been 'svn added' - are you aware of it? I put a nocommit. bq. Maybe we should do this work in a branch and avoid the .simple package? I'll start a branch, but on the branch I'd like to keep working on .simple for now while we bang out the API changes. If things start to crystallize then we can start cutting everything else over? bq. Can FT.FieldType provide a ctor taking these arguments and then DEFAULT_FIELD_TYPE pass (false,false)? Or, init those two members to false. bq. Maybe instead of setHierarchical + setMultiValued you do setDimType(DimType)? Then you could avoid the synchronization? I put a nocommit. bq. I wonder if FieldTypes won't confuse users w/ Field.FieldType, so maybe you should name it DimType or something? And the upper class FacetsConfig? Good, I renamed both. bq. Not sure, but I think that SimpleFacetFields adds the dimension's ordinal if the dim is both hierarchical and multi-valued? That's a step back from the default ALL_BUT_DIM that we have today. I think we might want to have a requiresDimValue or something, because I do think the dimension's value (count) is most often unneeded, and it's a waste to encode its ordinal? It does, and I think that's OK? Yes, it's one extra ord it indexes, but that will be a minor perf hit since it's a multi-valued and hierarchical. bq. Constants isn't documented, not sure if 'precommit' will like that, but in general I think the constants should have jdocs. Maybe put a nocommit? I put nocommits. Sadly, precommit won't fail (lucene/facet is missing in document-lint target in build.xml, because we never fixed all its javadocs). We should separately fix that. bq. TaxonomyFacetCounts.getAllDims() – If a dimension is not hierarchical, I think SimpleFacetResult.count == 0? In that case, sorting by its count is useless? I think it's also relevant for SortedSet? I THINK these will work correctly (we sum the dim value as we visit all children), but I was missing the corrected dim count in the hierarchical + MV case so I added that. Also, I put nocommits to test this. bq. LabelAndValue has a Number as the value rather than double. I see this class is used only in the final step (labeling), but what's wrong w/ the previous 'double' primitive? Is it to avoid casting or something? I think it's crazy when I ask for counts that I get a double back :) That's why I switched it to a Number. bq. CategoryListIterator I appreciate the usefulness of this API, but rather that adding in into simple from the get-go, I'd like to build out the different facet methods and understand if it's actually useful / worth the additional abstraction. For example, I'm not sure it would work very well with SSDV, since we first count in seg-ord space an then convert to global-ord space only when combining counts across segments (this gave better performance). I mean, yes, it would work in that the abstraction would be correct, but we'd be paying a performance penalty. bq. E.g. that's how we were able to test fast cutting over facets to DV from Payload, that's a nice abstraction for letting you load the values from a cache, and so forth. I think doing such future tests with the simple APIs will still be easy; I don't think we should open up abstractions for the encoding/decoding today. bq. FacetsAggregator I added another facet method, TaxonomyFacetSumValueSource for value source aggregation, in the next patch, to explore this need... bq. But I don't see this convenience getting away (you'll just pass a ListFacetsAggregator and then pull the requested values later on). True but ... we need some base class / interface that all these *Facets.java implement ... I haven't done that yet (it's a TODO). bq. What I like less about it is that it folds in the logic coded by FacetsAccumulator and FacetResultsHandler Maybe we can move some of these methods into the base class? I'm not sure though... since for the TaxonomyFacetSumValueSource it's float[] values and for the *Count it's int[] counts. At the end of the day, the code that does the facet counting, the rollup, pulling the topK, is in fact a small amount of code; I think replicating bits of this code for the different methods is the lesser evil than adding so many abstractions that the module is hard to approach by new devs/users. bq. Has variants that return a full sub-tree (TopKInEachNodeHandler) That handler is such an exotic use case ... and the app can just recurse itself, calling TaxoFacetCounts.getTopChildren? bq. FacetArrays We avoid the need to factor this out, by simply enumerating the facet impls directly. E.g. we (or a user) can make a TaxoFacetAvgScore that allocates
[jira] [Updated] (SOLR-5408) CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5408: - Summary: CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used (was: Collapsing Query Parser does not respect multiple Sort fields) CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used --- Key: SOLR-5408 URL: https://issues.apache.org/jira/browse/SOLR-5408 Project: Solr Issue Type: Bug Affects Versions: 4.5 Reporter: Brandon Chapman Assignee: Joel Bernstein Priority: Critical Attachments: CollapsingQParserPlugin.java, CollapsingQParserPlugin.java, SOLR-5027.patch, SOLR-5408.2.patch, SOLR-5408.patch, SOLR-5408.patch When using the collapsing query parser, only the last sort field appears to be used. http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20descqf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_engpf2=name_eng^2defType=edismaxrows=12pf=name_eng~5^3start=0q=ipadboost=sqrt(popularity)qt=/select_engfq=productType:MERCHANDISEfq=merchant:bestbuycanadafq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)fq=translations:engfl=psid,name_eng,scoredebug=truedebugQuery=truefq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} result name=response numFound=5927 start=0 maxScore=5.6674457 doc str name=psid3002010250210/str str name=name_eng ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) /str float name=score0.41423172/float /doc The same query without using the collapsing query parser produces the expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5408) CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used
[ https://issues.apache.org/jira/browse/SOLR-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822374#comment-13822374 ] Joel Bernstein commented on SOLR-5408: -- Brandon, That's good news. This fix has been committed to trunk and 4x. Thanks for reporting. Joel CollapsingQParserPlugin scores incorrectly when multiple sort criteria are used --- Key: SOLR-5408 URL: https://issues.apache.org/jira/browse/SOLR-5408 Project: Solr Issue Type: Bug Affects Versions: 4.5 Reporter: Brandon Chapman Assignee: Joel Bernstein Priority: Critical Attachments: CollapsingQParserPlugin.java, CollapsingQParserPlugin.java, SOLR-5027.patch, SOLR-5408.2.patch, SOLR-5408.patch, SOLR-5408.patch When using the collapsing query parser, only the last sort field appears to be used. http://172.18.0.10:8080/solr/product/select_eng?sort=score%20desc,name_sort_eng%20descqf=name_eng^3+brand^2+categories_term_eng+sku+upc+promoTag+model+related_terms_engpf2=name_eng^2defType=edismaxrows=12pf=name_eng~5^3start=0q=ipadboost=sqrt(popularity)qt=/select_engfq=productType:MERCHANDISEfq=merchant:bestbuycanadafq=(*:*+AND+-all_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(all_all_suppressed_b_ovly:false+AND+-rbc_all_suppressed_b_ovly:[*+TO+*]+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_all_suppressed_b_ovly:false+AND+-rbc_cpx_suppressed_b_ovly:[*+TO+*])+OR+(rbc_cpx_suppressed_b_ovly:false)fq=translations:engfl=psid,name_eng,scoredebug=truedebugQuery=truefq={!collapse+field%3DgroupId+nullPolicy%3Dexpand} result name=response numFound=5927 start=0 maxScore=5.6674457 doc str name=psid3002010250210/str str name=name_eng ZOTAC ZBOX nano XS AD13 Plus All-In-One PC (AMD E2-1800/2GB RAM/64GB SSD) /str float name=score0.41423172/float /doc The same query without using the collapsing query parser produces the expected result. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
4.6 ReleaseNotes
I started the Lucene Release Notes for 4.6 here: https://wiki.apache.org/lucene-java/ReleaseNote46 Feel free to add / optimize. I'd appreciate if somebody from Solr Land could start the Solr Release notes, I am not on top of the changes there! Thanks, Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5052) bitset codec for off heap filters
[ https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822389#comment-13822389 ] Yuriy commented on LUCENE-5052: --- This is very simple implementation of codec which stores PostingLists as BitSets. This implementation passes BasePostingsFormatTestCase.testDocsOnly() test. Also I have found difficult to implement term dictionary and feel like it's better to somehow combine this posting format with any of standard ones. bitset codec for off heap filters - Key: LUCENE-5052 URL: https://issues.apache.org/jira/browse/LUCENE-5052 Project: Lucene - Core Issue Type: New Feature Components: core/codecs Reporter: Mikhail Khludnev Labels: features Fix For: 5.0 Colleagues, When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but it should be fast. The obvious way to handle this is to decode postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as well are expensive. Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 (what about skiplists? and overall performance?). Beside of the codec implementation, the trickiest part to me is to design API for this. How we can let the app know that a term query don’t need to be cached in heap, but can be held as an mmaped bitset? WDYT? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5052) bitset codec for off heap filters
[ https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuriy updated LUCENE-5052: -- Attachment: bitsetcodec.zip see https://issues.apache.org/jira/browse/LUCENE-5052?focusedCommentId=13822389page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13822389 bitset codec for off heap filters - Key: LUCENE-5052 URL: https://issues.apache.org/jira/browse/LUCENE-5052 Project: Lucene - Core Issue Type: New Feature Components: core/codecs Reporter: Mikhail Khludnev Labels: features Fix For: 5.0 Attachments: bitsetcodec.zip Colleagues, When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but it should be fast. The obvious way to handle this is to decode postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as well are expensive. Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 (what about skiplists? and overall performance?). Beside of the codec implementation, the trickiest part to me is to design API for this. How we can let the app know that a term query don’t need to be cached in heap, but can be held as an mmaped bitset? WDYT? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-5052) bitset codec for off heap filters
[ https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuriy updated LUCENE-5052: -- Comment: was deleted (was: see https://issues.apache.org/jira/browse/LUCENE-5052?focusedCommentId=13822389page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13822389) bitset codec for off heap filters - Key: LUCENE-5052 URL: https://issues.apache.org/jira/browse/LUCENE-5052 Project: Lucene - Core Issue Type: New Feature Components: core/codecs Reporter: Mikhail Khludnev Labels: features Fix For: 5.0 Attachments: bitsetcodec.zip Colleagues, When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but it should be fast. The obvious way to handle this is to decode postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as well are expensive. Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 (what about skiplists? and overall performance?). Beside of the codec implementation, the trickiest part to me is to design API for this. How we can let the app know that a term query don’t need to be cached in heap, but can be held as an mmaped bitset? WDYT? -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: DocValue on Strings slow and OOM
If anyone if following this one, just an update. We are not going to upgrade to 4.5.1 in order to see if the String facet performance problem has been fixed. Instead we have made a few hacks around our data so that we can store the c-field (c_dstr_doc_sto) as long instead (c_dlng_doc_sto). So now we only need to struggle with long-facet performance. There is a performance issue with facets on longs though, but I will tell about in another mailing-thread - need your input on what solution you prefer. Regards, Per Steffensen On 11/6/13 12:15 PM, Per Steffensen wrote: On 11/6/13 11:43 AM, Robert Muir wrote: Before lucene 4.5 docvalues were loaded entirely into RAM. I'm not going to waste time debugging any old code releases here, you should upgrade to the latest release! Ok, thanks! I do not consider it a bug (just a performance issue), so no debugging needed. It is just that we do not want to spend time upgrading to 4.5 if there is not a justified hope/explanation that it will probably make things better. But I guess there is. One short question: Will 4.5 index things differently (compared to 4.4) for documents with fields like I showed earlier? Im basically asking if we need to reindex the 12billion documents again after upgrading to 4.5, or if we ought to be able to deploy 4.5 on top of the already indexed documents. Regards, Per Steffensen
[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct
[ https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822400#comment-13822400 ] Shalin Shekhar Mangar commented on SOLR-5428: - Thanks for the patch Elran. Collecting the 'distinctValues' is a very expensive operation. There should be a way to stop the collection of these two statistics. Have you seen the LukeRequestHandler? Using the fl and maxTerms params I think you can get the same information. http://wiki.apache.org/solr/LukeRequestHandler new statistics results to StatsComponent - distinctValues and countDistinct --- Key: SOLR-5428 URL: https://issues.apache.org/jira/browse/SOLR-5428 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Shalin Shekhar Mangar Attachments: SOLR-5428.patch I thought it would be very useful to display the distinct values (and the count) of a field among other statistics. Attached a patch implementing this in StatsComponent. Added results : distinctValues - list of all distnict values countDistinct - distnict values count. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
Smoke tester passes for me. +1! Shai On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822450#comment-13822450 ] Shai Erera commented on LUCENE-5339: About the patch: * I think FacetField should an optional ctor taking the indexedFacetField, defaulting to $facets, then the ctor calls super() with the right field, and not dummy? And you can remove set/get? * SimpleFacetsCollector jdocs are wrong -- there's no create? * Do we still need SSDVFacetFields? * I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me think if down the road we won't be sorry about doing this change (i.e. if anything changes on IW APIs). The good thing about FacetFields is that it just adds fields to a Document, and doesn't worry about IW API at all... * DimType == DimConfig :). Sorry if I wasn't clear somewhere in my long response. {quote} That handler is such an exotic use case ... and the app can just recurse itself, calling TaxoFacetCounts.getTopChildren? {quote} Could be, maybe it will work. It definitely allows asking for different topK for each child (something we currently don't support). {quote} It does, and I think that's OK? Yes, it's one extra ord it indexes, but that will be a minor perf hit since it's a multi-valued and hierarchical. {quote} I don't know. Like if your facet has 2 levels, that's 33% more ords. I think the count of the root ord is most likely never needed? And if it's needed, app can compute it by traversing its children and their values in the facet arrays? Maybe as a default we just never index it, and don't add a vague requiresDimCount/Value/Weight boolean? {quote} I think replicating bits of this code for the different methods is the lesser evil than adding so many abstractions that the module is hard to approach by new devs/users. {quote} Did we ever get such feedback from users? That the module is unapproachable? I get the opposite feedback - that we don't have many abstractions! :) {quote} At the end of the day, the code that does the facet counting, the rollup, pulling the topK, is in fact a small amount of code; {quote} You have a nocommit maybe we should do this lazily in regards for when to rollupValues. That shows me that now every developer who extends this API (let's be clear - users are oblivious to this happening) will facet the same decision (nocommit). If we discover one day that it's better to rollup lazily or not, other developers don't benefit from that decision. That's why I think some abstractions are good. {quote} I added ValueSource aggregation in the next patch, but not associations; I think associations can come later (it's just another index time and search time impl). {quote} I'm not sure we should do that (cut over associations later). The whole point about these features (associations, complements, sampling..) is that they are existing features. If we think they are useless / unneeded - that's one thing. But if we believe they are important, it's useless to make all the API changes without taking them into account, only to figure out later that we need abstraction X and Y in order to implement them. And we make heavy use of associations, and some users asked (and use) sampling and I remember a question about complements. So obviously we cannot conclude that these are useless features. Therefore I think it's important that we try to tackle them now, so don't we don't do a full round trip to find ourselves with the same API again. Can we do FacetIndexWriter in a separate issue (if we want to do it at all)? It's unrelated to the search API changes you want to do here, and it might be easier to contain within a single issue? About CategoryListIterator ... what if we do manage to come up tomorrow with a better encoding strategy for facets. Do you really think that changing all existing WhateverFacets makes sense!? And if developers write their own WhateverFacets, it means they need to change their code too? Really, you're mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that there are apps that can benefit from a different encoding scheme (e.g. FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better default encoding scheme to introduce abstractions. I mean .. that's just sounds crazy to me. Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting,
[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct
[ https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822465#comment-13822465 ] Elran Dvir commented on SOLR-5428: -- Thanks, Shalin. My use case requires 'distinctValues' alongside the other results, so I am afraid using LukeRequestHandler is not suitable. In what way is it expensive? Is tere a way to improve it? What do you mean when you say There should be a way to stop the collection ? Thanks. new statistics results to StatsComponent - distinctValues and countDistinct --- Key: SOLR-5428 URL: https://issues.apache.org/jira/browse/SOLR-5428 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Shalin Shekhar Mangar Attachments: SOLR-5428.patch I thought it would be very useful to display the distinct values (and the count) of a field among other statistics. Attached a patch implementing this in StatsComponent. Added results : distinctValues - list of all distnict values countDistinct - distnict values count. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
+1 SUCCESS! [2:50:09.253204] () however we have the usual ***WARNING***: javadocs want to fail! for Solr (which IMHO we should fix) Regards, Tommaso 2013/11/14 Shai Erera ser...@gmail.com Smoke tester passes for me. +1! Shai On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct
[ https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822470#comment-13822470 ] Yago Riveiro commented on SOLR-5428: Collect the distinctValues can be expensive but in my case is a requirement that Solr can't give me in a easy way. I need to do a facet query limit -1 to get all uniq terms that match the query. If the StatsComponent can do the same thing, expensive or not, I vote to have the feature. The way how use it and the pros and cons of use it must be a decision made by the user. new statistics results to StatsComponent - distinctValues and countDistinct --- Key: SOLR-5428 URL: https://issues.apache.org/jira/browse/SOLR-5428 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Shalin Shekhar Mangar Attachments: SOLR-5428.patch I thought it would be very useful to display the distinct values (and the count) of a field among other statistics. Attached a patch implementing this in StatsComponent. Added results : distinctValues - list of all distnict values countDistinct - distnict values count. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct
[ https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822477#comment-13822477 ] Elran Dvir commented on SOLR-5428: -- Anthother thing I thout about: my queries have q,fq and are distributed. Does LukeRequestHandler support it? new statistics results to StatsComponent - distinctValues and countDistinct --- Key: SOLR-5428 URL: https://issues.apache.org/jira/browse/SOLR-5428 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Shalin Shekhar Mangar Attachments: SOLR-5428.patch I thought it would be very useful to display the distinct values (and the count) of a field among other statistics. Attached a patch implementing this in StatsComponent. Added results : distinctValues - list of all distnict values countDistinct - distnict values count. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822482#comment-13822482 ] Anca Kopetz commented on SOLR-2649: --- We need to apply Min should match for edismax query strings with operators (AND,OR) and mm parameter. Therefore we developed our custom query parser. The code is below, maybe it is useful for somebody who has the same requirements. {code:title=CustomExtendedDismaxQParser.java} public class CustomExtendedDismaxQParser extends ExtendedDismaxQParser { public CustomExtendedDismaxQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); } @Override protected Query parseOriginalQuery(ExtendedSolrQueryParser up, String mainUserQuery, ListClause clauses, ExtendedDismaxConfiguration config) { Query query = super.parseOriginalQuery(up, mainUserQuery, clauses, config); String mmValue = this.params.get(DisMaxParams.MM); if(!Strings.isNullOrEmpty(mmValue)) { if (query instanceof BooleanQuery) { SolrPluginUtils.setMinShouldMatch((BooleanQuery)query, mmValue); } } return query; } } {code} {code:title=solrconfig.xml} queryParser name=kelkooEdismax class=com.kelkoo.search.solr.plugins.CustomExtendedDismaxQParserPlugin/ {code} MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: query parsers Reporter: Magnus Bergmark Priority: Minor Fix For: 4.6 Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
+1! The smoker test succeeded. On 14 November 2013 15:09, Tommaso Teofili tommaso.teof...@gmail.comwrote: +1 SUCCESS! [2:50:09.253204] () however we have the usual ***WARNING***: javadocs want to fail! for Solr (which IMHO we should fix) Regards, Tommaso 2013/11/14 Shai Erera ser...@gmail.com Smoke tester passes for me. +1! Shai On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen
[jira] [Created] (SOLR-5441) Expose transaction log files number of their size via JMX
Rafał Kuć created SOLR-5441: --- Summary: Expose transaction log files number of their size via JMX Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Priority: Minor It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822525#comment-13822525 ] Rafał Kuć commented on SOLR-5441: - I'll provide patch later today. Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Priority: Minor It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822528#comment-13822528 ] Steve Rowe commented on LUCENE-5216: bq. Committed to trunk. Will backport to 4x after I backport the main DV updates changes first. [~shaie], looks like you included these changes in your branch_4x commit under LUCENE-5189, so this issue can be resolved? FYI, when you merge multiple issues' commits, it's useful to include all issue numbers in the commit log message, so that they get auto-posted to the relevant JIRA issues. That didn't happen here. Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Attachments: LUCENE-5216.patch Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822537#comment-13822537 ] Simon Rosenthal commented on SOLR-4722: --- Great patch ! I'd like to use the code as the basis for a component which will simply return term positions for each query term - no need for having highlighting enabled as a prerequisite, or to return term offsets - this is a text mining project where we'll be running queries in batch mode and storing this information externally. Can you think of any gotchas I might encounter ? Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, 5.0 Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5215. Resolution: Fixed Fix Version/s: 5.0 4.6 Committed to 4x under LUCENE-5189. Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch, LUCENE-5215.patch In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5216. Resolution: Fixed Fix Version/s: 5.0 4.6 Assignee: Shai Erera Lucene Fields: New,Patch Available (was: New) Committed to 4x under LUCENE-5189. Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5216.patch Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates
[ https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5248. Resolution: Fixed Fix Version/s: 5.0 4.6 Committed to 4x under LUCENE-5189. Improve the data structure used in ReaderAndLiveDocs to hold the updates Key: LUCENE-5248 URL: https://issues.apache.org/jira/browse/LUCENE-5248 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch Currently ReaderAndLiveDocs holds the updates in two structures: +MapString,MapInteger,Long+ Holds a mapping from each field, to all docs that were updated and their values. This structure is updated when applyDeletes is called, and needs to satisfy several requirements: # Un-ordered writes: if a field f is updated by two terms, termA and termB, in that order, and termA affects doc=100 and termB doc=2, then the updates are applied in that order, meaning we cannot rely on updates coming in order. # Same document may be updated multiple times, either by same term (e.g. several calls to IW.updateNDV) or by different terms. Last update wins. # Sequential read: when writing the updates to the Directory (fieldsConsumer), we iterate on the docs in-order and for each one check if it's updated and if not, pull its value from the current DV. # A single update may affect several million documents, therefore need to be efficient w.r.t. memory consumption. +MapInteger,MapString,Long+ Holds a mapping from a document, to all the fields that it was updated in and the updated value for each field. This is used by IW.commitMergedDeletes to apply the updates that came in while the segment was merging. The requirements this structure needs to satisfy are: # Access in doc order: this is how commitMergedDeletes works. # One-pass: we visit a document once (currently) and so if we can, it's better if we know all the fields in which it was updated. The updates are applied to the merged ReaderAndLiveDocs (where they are stored in the first structure mentioned above). Comments with proposals will follow next. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5246) SegmentInfoPerCommit continues to list unneeded updatesFiles
[ https://issues.apache.org/jira/browse/LUCENE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5246. Resolution: Fixed Fix Version/s: 5.0 4.6 Committed to 4x under LUCENE-5189. SegmentInfoPerCommit continues to list unneeded updatesFiles Key: LUCENE-5246 URL: https://issues.apache.org/jira/browse/LUCENE-5246 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5246.patch, LUCENE-5246.patch SegmentInfoPerCommit continues to list updates files even if they are unneeded anymore. For example, if you update the values of documents of field 'f', it creates a gen'd .fnm (FieldInfos) file. If you commit/reopen and update the field again (maybe now a different set of documents), it creates another gen'd .fnm, but continues to list both gens, even though only the latest one is needed. To solve this, SIPC would need to know then dvGen of each FieldInfo, so that it can correctly list only the updates files that are truly needed. I'll work on a testcase and fix. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822558#comment-13822558 ] Steve Rowe commented on LUCENE-5216: bq. Oh, I didn't know we can do that! So I'd do svn ci -m LUCENE-123 LUCENE-234: message? Do they need to be separated by comma or something? I'm pretty sure there is no required punctuation - AFAIK any svn commit log message matching regex /PROJECT-\d+/ *anywhere in the log message* gets added as a comment to the corresponding JIRA issue, and multiple issue mentions result in comment addition to all of them. Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5216.patch Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822561#comment-13822561 ] Shai Erera commented on LUCENE-5216: Thanks Steve, will try to keep that in mind. I always thought we must format the messages like Lucene-1234: message, i.e. PROJECT-number even followed by colon. Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5216.patch Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822564#comment-13822564 ] Steve Rowe commented on LUCENE-5216: I think that may once have been true, maybe for early versions of Mark Miller's service?, but is no longer. Here's a recent example showing no punctuation required, and multiple issues: https://issues.apache.org/jira/browse/LUCENE-5217?focusedCommentId=13820840page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820840 Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5216.patch Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5216) Fix SegmentInfo.attributes when updates are involved
[ https://issues.apache.org/jira/browse/LUCENE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822567#comment-13822567 ] Shai Erera commented on LUCENE-5216: Cool! Fix SegmentInfo.attributes when updates are involved Key: LUCENE-5216 URL: https://issues.apache.org/jira/browse/LUCENE-5216 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5216.patch Today, SegmentInfo.attributes are write-once. However, in the presence of field updates (see LUCENE-5189 and LUCENE-5215) this creates an issue, in which if a Codec decides to alter the attributes when updates are applied, they are silently discarded. This is rather a corner case, though one that should be addressed. There were two solutions to address this: # Record SI.attributes in SegmentInfos, so they are written per-commit, instead of the .si file. # Remove them altogether, as they don't seem to be used anywhere in Lucene code today. If we remove them, we basically don't take away special capability from Codecs, because they can still write the attributes to a separate file, or even the file they record the other data in. This will work even with updates, as long as Codecs respect the given segmentSuffix. If we keep them, I think the simplest solution is to read/write them by SegmentInfos. But if we don't see a good use case, I suggest we remove them, as it's just extra code to maintain. I think we can even risk a backwards break and remove them completely from 4x, though if that's a problem, we can deprecate too. If anyone sees a good usage for them, or better - already uses them, please speak up, so we can make the proper decision. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
[ https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822572#comment-13822572 ] Erick Erickson commented on SOLR-5287: -- Stefan: Nope, you're right on time. I'll be around this weekend too, although in a different time zone... Allow at least solrconfig.xml and schema.xml to be edited via the admin screen -- Key: SOLR-5287 URL: https://issues.apache.org/jira/browse/SOLR-5287 Project: Solr Issue Type: Improvement Components: Schema and Analysis, web gui Affects Versions: 4.5, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5287.patch, SOLR-5287.patch, SOLR-5287.patch A user asking a question on the Solr list got me to thinking about editing the main config files from the Solr admin screen. I chatted briefly with [~steffkes] about the mechanics of this on the browser side, he doesn't see a problem on that end. His comment is there's no end point that'll write the file back. Am I missing something here or is this actually not a hard problem? I see a couple of issues off the bat, neither of which seem troublesome. 1 file permissions. I'd imagine lots of installations will get file permission exceptions if Solr tries to write the file out. Well, do a chmod/chown. 2 screwing up the system maliciously or not. I don't think this is an issue, this would be part of the admin handler after all. Does anyone have objections to the idea? And how does this fit into the work that [~sar...@syr.edu] has been doing? I can imagine this extending to SolrCloud with a push this to ZK option or something like that, perhaps not in V1 unless it's easy. Of course any pointers gratefully received. Especially ones that start with Don't waste your effort, it'll never work (or be accepted)... Because what scares me is this seems like such an easy thing to do that would be a significant ease-of-use improvement, so there _has_ to be something I'm missing. So if we go forward with this we'll make this the umbrella JIRA, the two immediate sub-JIRAs that spring to mind will be the UI work and the endpoints for the UI work to use. I think there are only two end-points here 1 list all the files in the conf (or arbitrary from solr_home/collection) directory. 2 write this text to this file Possibly later we could add clone the configs from coreX to coreY. BTW, I've assigned this to myself so I don't lose it, but if anyone wants to take it over it won't hurt my feelings a bit -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2724) Deprecate defaultSearchField and defaultOperator defined in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822586#comment-13822586 ] David Smiley commented on SOLR-2724: Anca, Yeah, I know. It is deprecated but still available. Deprecate defaultSearchField and defaultOperator defined in schema.xml -- Key: SOLR-2724 URL: https://issues.apache.org/jira/browse/SOLR-2724 Project: Solr Issue Type: Improvement Components: Schema and Analysis, search Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 3.6, 4.0-ALPHA Attachments: SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch, SOLR-2724_deprecateDefaultSearchField_and_defaultOperator.patch Original Estimate: 2h Remaining Estimate: 2h I've always been surprised to see the defaultSearchField element and solrQueryParser defaultOperator=OR/ defined in the schema.xml file since the first time I saw them. They just seem out of place to me since they are more query parser related than schema related. But not only are they misplaced, I feel they shouldn't exist. For query parsers, we already have a df parameter that works just fine, and explicit field references. And the default lucene query operator should stay at OR -- if a particular query wants different behavior then use q.op or simply use OR. similarity Seems like something better placed in solrconfig.xml than in the schema. In my opinion, defaultSearchField and defaultOperator configuration elements should be deprecated in Solr 3.x and removed in Solr 4. And similarity should move to solrconfig.xml. I am willing to do it, provided there is consensus on it of course. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct
[ https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822597#comment-13822597 ] Shalin Shekhar Mangar commented on SOLR-5428: - You're both right. We can't replace this functionality with LukeRequestHandler. At the same time, forcing everyone to keep a set of distinct values in memory, when someone just needs min, max or count is bad. new statistics results to StatsComponent - distinctValues and countDistinct --- Key: SOLR-5428 URL: https://issues.apache.org/jira/browse/SOLR-5428 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Shalin Shekhar Mangar Attachments: SOLR-5428.patch I thought it would be very useful to display the distinct values (and the count) of a field among other statistics. Attached a patch implementing this in StatsComponent. Added results : distinctValues - list of all distnict values countDistinct - distnict values count. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-5441: --- Assignee: Shalin Shekhar Mangar Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafał Kuć updated SOLR-5441: Attachment: SOLR-5441.patch Please look at the attached patch and see if this is the right approach and if anything should be modified. Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0-ea-b114) - Build # 3465 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3465/ Java: 32bit/jdk1.8.0-ea-b114 -server -XX:+UseParallelGC 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.component.DistributedDebugComponentTest Error Message: Unable to delete file: .\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000 Stack Trace: java.io.IOException: Unable to delete file: .\org.apache.solr.handler.component.DistributedDebugComponentTest\collection2\data\tlog\tlog.000 at __randomizedtesting.SeedInfo.seed([D76932B7DFD5F27B]:0) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1919) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:1910) at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1399) at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1331) at org.apache.solr.SolrJettyTestBase.cleanUpJettyHome(SolrJettyTestBase.java:189) at org.apache.solr.handler.component.DistributedDebugComponentTest.afterTest(DistributedDebugComponentTest.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:700) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:744) Build Log: [...truncated 10841 lines...] [junit4] Suite: org.apache.solr.handler.component.DistributedDebugComponentTest [junit4] 2 1897034 T7117 oas.SolrTestCaseJ4.initCore initCore [junit4] 2 Creating dataDir: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\.\solrtest-DistributedDebugComponentTest-1384448216339 [junit4] 2 1897036 T7117 oas.SolrTestCaseJ4.initCore initCore end [junit4] 2 1897036 T7117 oas.SolrJettyTestBase.getSSLConfig Randomized ssl (true) and clientAuth (false) [junit4] 2 1897036 T7117 oejs.Server.doStart jetty-8.1.10.v20130312 [junit4] 2 1897040 T7117 oejs.AbstractConnector.doStart Started SelectChannelConnector@127.0.0.1:58776 [junit4] 2 1897041 T7117 oass.SolrDispatchFilter.init SolrDispatchFilter.init() [junit4] 2 1897041 T7117 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for
[jira] [Commented] (SOLR-5428) new statistics results to StatsComponent - distinctValues and countDistinct
[ https://issues.apache.org/jira/browse/SOLR-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822612#comment-13822612 ] Yago Riveiro commented on SOLR-5428: Ok, I forgot that the StatsComponent return all metrics in one call. Maybe the StatsCompement needs some tweaking to only return the metrics that we need and not all. If the analytics component could working with distributed searchs this patch would not necessary. new statistics results to StatsComponent - distinctValues and countDistinct --- Key: SOLR-5428 URL: https://issues.apache.org/jira/browse/SOLR-5428 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Shalin Shekhar Mangar Attachments: SOLR-5428.patch I thought it would be very useful to display the distinct values (and the count) of a field among other statistics. Attached a patch implementing this in StatsComponent. Added results : distinctValues - list of all distnict values countDistinct - distnict values count. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822619#comment-13822619 ] Shalin Shekhar Mangar commented on SOLR-5441: - Thanks Rafał. We need to put the iteration in a synchronized block. Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
+1 from my side as well. Release candidate looks good and smoke tester was happy. On Thu, Nov 14, 2013 at 4:17 PM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: +1! The smoker test succeeded. On 14 November 2013 15:09, Tommaso Teofili tommaso.teof...@gmail.com wrote: +1 SUCCESS! [2:50:09.253204] () however we have the usual ***WARNING***: javadocs want to fail! for Solr (which IMHO we should fix) Regards, Tommaso 2013/11/14 Shai Erera ser...@gmail.com Smoke tester passes for me. +1! Shai On Thu, Nov 14, 2013 at 11:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822636#comment-13822636 ] Tricia Jenkins commented on SOLR-4722: -- Thanks for your interest. This code/jar could be used as is for your purposes. If you don't want to specify highlighting enabled in each query just move it to conf/solrconfig.xml: {code:xml} requestHandler name=standard class=solr.StandardRequestHandler lst name=defaults str name=hltrue/str /lst /requestHandler {code} This highlighter only returns the term positions. The term offsets are stored because they're used by the FastVectorHighlighter. You won't get any useful information from this highlighter if you disable termOffsets in your schema.xml. I just ran this patch against trunk. Still works! Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, 5.0 Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822644#comment-13822644 ] Tomás Fernández Löbbe commented on SOLR-5399: - I think the problem is that the test tries to delete the solr home before stopping Jetty. I'm testing a fix now Improve DebugComponent for distributed requests --- Key: SOLR-5399 URL: https://issues.apache.org/jira/browse/SOLR-5399 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Assignee: Ryan Ernst Fix For: 5.0, 4.7 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch I'm working on extending the DebugComponent for adding some useful information to be able to track distributed requests better. I'm adding two different things, first, the request can generate a request ID that will be printed in the logs for the main query and all the different internal requests to the different shards. This should make it easier to find the different parts of a single user request in the logs. It would also add the purpose of each internal request to the logs, like: RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. Also, I'm adding a track section to the debug info where to add information about the different phases of the distributed request (right now, I'm only including QTime, but could eventually include more information) like: {code:xml} lst name=debug lst name=track lst name=EXECUTE_QUERY str name=localhost:8985/solrQTime: 10/str str name=localhost:8984/solrQTime: 25/str /lst lst name=GET_FIELDS str name=localhost:8985/solrQTime: 1/str /lst /lst /lst {code} To get this, debugQuery must be set to true, or debug must include debug=track. This information is only added to distributed requests. I would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafał Kuć updated SOLR-5441: Attachment: SOLR-5441-synchronized.patch Added sychronized on logs list in the UpdateLog class. Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822696#comment-13822696 ] ASF subversion and git services commented on SOLR-5441: --- Commit 1541999 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1541999 ] SOLR-5441: Expose number of transaction log files and their size via JMX Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822697#comment-13822697 ] ASF subversion and git services commented on SOLR-5441: --- Commit 1542000 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1542000 ] SOLR-5441: Expose number of transaction log files and their size via JMX Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5441) Expose transaction log files number of their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-5441. - Resolution: Fixed Fix Version/s: 5.0 4.6 Thanks Rafał! Expose transaction log files number of their size via JMX - Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.6, 5.0 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
-1 Smoke tester passes. Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” follows “Detailed Change List”, but should be above it; and one change attribution didn’t get recognized because it’s missing parens: Elran Dvir via Erick Erickson. Definitely not worth a respin in either case. Lucene Changes look good, except that the “API Changes” section in Changes.html is formatted as an item in the “Bug Fixes” section, rather than its own section. I’ll fix. (The issue is that “API Changes:” in CHANGES.txt has a trailing colon - the section name regex should allow this.) This is probably not worth a respin. Lucene and Solr Documentation pages look good, except that the File Formats” link from the Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). This is respin-worthy. Updating this is not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t change in every release. I’ll try to automate extracting the default from o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]. Lucene and Solr Javadocs look good. Steve On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822721#comment-13822721 ] Michael McCandless commented on LUCENE-5339: Thanks for the feedback Shai. bq. I think FacetField should an optional ctor taking the indexedFacetField, defaulting to $facets, then the ctor calls super() with the right field, and not dummy? And you can remove set/get? I moved the indexed field name to the DimConfig. bq. SimpleFacetsCollector jdocs are wrong – there's no create? I removed it and put nocommit. bq. Do we still need SSDVFacetFields? Woops, no; I removed it. bq. I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me think if down the road we won't be sorry about doing this change (i.e. if anything changes on IW APIs). The good thing about FacetFields is that it just adds fields to a Document, and doesn't worry about IW API at all... I think that's an acceptable risk in exchange for the simpler index-time API. bq. DimType == DimConfig . Sorry if I wasn't clear somewhere in my long response. Ahh, OK. I'm still wondering if we should put the facetMethod onto the DimConfig... bq. Like if your facet has 2 levels, that's 33% more ords. I think the count of the root ord is most likely never needed? And if it's needed, app can compute it by traversing its children and their values in the facet arrays? Maybe as a default we just never index it, and don't add a vague requiresDimCount/Value/Weight boolean? Wait, the app cannot compute this (accurately) by summing the child counts? It will overcount in general, right? {quote} bq. I think replicating bits of this code for the different methods is the lesser evil than adding so many abstractions that the module is hard to approach by new devs/users. Did we ever get such feedback from users? That the module is unapproachable? {quote} It's mostly from my own assessment, looking at the code, and my own frustrations over time in trying to improve the facets module; LUCENE-5333 was finally the straw (for me)... I find complex APIs frustrating and I think it's a serious barrier to new contributors getting involved and users consuming them. There was a user on the ML (I don't have the link) who just wanted to get the facet count for a specific label after faceting was done, and the hoops s/he had to jump through (custom FacetResultHandler I think??) to achieve that was just crazy. bq. I get the opposite feedback - that we don't have many abstractions! Seriously? What abstractions are we missing? If this is too much change for the facets module, we could, instead, leave the facets module as is, and break this effort out as a different module (facets2, simplefacets, something?). We also have many queryparsers, many highlighters, etc., and I think that's healthy: all options can be explored. bq. You have a nocommit maybe we should do this lazily in regards for when to rollupValues. That shows me that now every developer who extends this API (let's be clear - users are oblivious to this happening) will facet the same decision (nocommit). If we discover one day that it's better to rollup lazily or not, other developers don't benefit from that decision. That's why I think some abstractions are good. It's a crazy expert thing to create another faceting impl, so I think such developers can handle changing their rollup to be lazy if it's beneficial/necessary for their use case. bq. I'm not sure we should do that (cut over associations later). The whole point about these features (associations, complements, sampling..) is that they are existing features. If we think they are useless / unneeded - that's one thing. But if we believe they are important, it's useless to make all the API changes without taking them into account, only to figure out later that we need abstraction X and Y in order to implement them. Well this could be a good argument for just making a new module? The new module would have a simpler API and less thorough functionality? bq. Can we do FacetIndexWriter in a separate issue (if we want to do it at all)? It's unrelated to the search API changes you want to do here, and it might be easier to contain within a single issue? I'm not sure it's so easily separated out; the DimConfig is common to index time and search time, and we're still iterating on that (I just moved the indexedFieldName into it). bq. About CategoryListIterator ... what if we do manage to come up tomorrow with a better encoding strategy for facets. Do you really think that changing all existing WhateverFacets makes sense!? And if developers write their own WhateverFacets, it means they need to change their code too? Really, you're mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that there are apps that can benefit from a different encoding scheme (e.g.
[jira] [Commented] (SOLR-5402) SolrCloud 4.5 bulk add errors in cloud setup
[ https://issues.apache.org/jira/browse/SOLR-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822728#comment-13822728 ] Greg Walters commented on SOLR-5402: I've been able to reproduce this issue using curl to add documents but using the post.jar provided in the Solr example or using a solrj client I'm unable to reproduce this issue having tried batches up to 5000 documents. SolrCloud 4.5 bulk add errors in cloud setup Key: SOLR-5402 URL: https://issues.apache.org/jira/browse/SOLR-5402 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5, 4.5.1 Reporter: Sai Gadde Fix For: 4.6 We use out of the box Solr 4.5.1 no customization done. If we merge documents via SolrJ to a single server it is perfectly working fine. But as soon as we add another node to the cloud we are getting following while merging documents. We merge about 500 at a time using SolrJ. These 500 documents in total are about few MB (1-3) in size. This is the error we are getting on the server (10.10.10.116 - IP is irrelavent just for clarity)where merging is happening. 10.10.10.119 is the new node here. This server gets RemoteSolrException shard update error StdNode: http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) On the other server 10.10.10.119 we get following error org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?). at [row,col
[jira] [Created] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl
Steve Rowe created LUCENE-5341: -- Summary: Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl Key: LUCENE-5341 URL: https://issues.apache.org/jira/browse/LUCENE-5341 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6, 5.0, 4.7 In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). Updating this is not automated now - it’s hard-coded in {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every release. The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format documentation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5381) Split Clusterstate and scale
[ https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809096#comment-13809096 ] Noble Paul edited comment on SOLR-5381 at 11/14/13 6:42 PM: OK , here is the plan to split clusterstate on a per collection basis h2. How to use this feature? Introduce a new option while creating a collection (external=true) . This will keep the state of the collection in a separate node. example : http://localhost:8983/solr/admin/collections?action=CREATEname=xcollnumShards=5replicationFactor=2external=true This will result in this following entry in clusterstate.json {code:JavaScript} { “xcoll” : {“ex”:true} } {code} there will be another ZK entry which carries the actual collection information * /collections ** /xcoll *** /state.json {code:JavaScript} {xcoll:{ shards:{shard1:{ range:”8000-b332”l, state:active, replicas:{ core_node1:{ state:active, base_url:http://192.168.1.5:8983/solr;, core:xcoll_shard1_replica1, node_name:192.168.1.5:8983_solr, leader:true, router:{name:compositeId}}} {code} The main Overseer thread is responsible for creating collections and managing all the events for all the collections in the clusterstate.json . clusterstate.json is modified only when a collection is created/deleted or when state updates happen to “non-external” collections Each external collection to have its own Overseer queue as follows. There will be a separate thread for each external collection. * /collections ** /xcoll *** /overseer /collection-queue-work /queue /queue-work h2. SolrJ enhancements SolrJ would not listen to any ZK node. When a request comes for a collection ‘xcoll’ * it would first check if such a collection exists * If yes it first looks up the details in the local cache for that collection * If not found in cache , it fetches the node /collections/xcoll/state.json and caches the information * Any query/update will be sent with extra query param specifying the collection name , shard name, Role (Leader/Replica), and range (example \_target_=xcoll:shard1:L:8000-b332) . A node would throw an error (INVALID_NODE) if it does not the serve the collection/shard/Role/range combo. * If a SolrJ gets INVALID_NODE error it would invalidate the cache and fetch fresh state information for that collection (and caches it again). h2. Changes to each Solr Node Each node would only listen to the clusterstate.json and the states of collections which it is a member of. If a request comes for a collection it does not serve, it first checks for the \_target_ param. All collections present in the clusterstate.json will be deemed as collections it serves * If the param is present and the node does not serve that collection/shard/Role/Range combo, an INVALID_NODE error is thrown ** If the validation succeeds it is served * If the param is not present and the node is a member of the collection, the request is served by ** If the node is not a member of the collection, it uses SolrJ to proxy the request to appropriate location Internally , the node really does not care about the state of external collections. If/when it is required, the information is fetched real time from ZK and used and thrown away. h2. Changes to admin GUI External collections are not shown graphically in the admin UI . was (Author: noble.paul): OK , here is the plan to split clusterstate on a per collection basis h2. How to use this feature? Introduce a new option while creating a collection (external=true) . This will keep the state of the collection in a separate node. example : http://localhost:8983/solr/admin/collections?action=CREATEname=xcollnumShards=5replicationFactor=2external=true This will result in this following entry in clusterstate.json {code:JavaScript} { “xcoll” : {“ex”:true} } {code} there will be another ZK entry which carries the actual collection information * /collections ** /xcoll *** /state.json {code:JavaScript} {xcoll:{ shards:{shard1:{ range:”8000-b332”l, state:active, replicas:{ core_node1:{ state:active, base_url:http://192.168.1.5:8983/solr;, core:xcoll_shard1_replica1, node_name:192.168.1.5:8983_solr, leader:true, router:{name:compositeId}}} {code} The main Overseer thread is responsible for creating collections and managing all the events for all the collections in the clusterstate.json . clusterstate.json is modified only when a collection is created/deleted or when state updates happen to “non-external” collections Each external collection to have its own Overseer queue as follows. There will be a separate thread for each external
[jira] [Updated] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl
[ https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-5341: --- Attachment: LUCENE-5341.patch Patch automating default codec extraction for use in the URL to index file format documentation. I'll commit shortly. Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl Key: LUCENE-5341 URL: https://issues.apache.org/jira/browse/LUCENE-5341 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6, 5.0, 4.7 Attachments: LUCENE-5341.patch In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). Updating this is not automated now - it’s hard-coded in {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every release. The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format documentation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl
[ https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822755#comment-13822755 ] ASF subversion and git services commented on LUCENE-5341: - Commit 1542012 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1542012 ] LUCENE-5341: automate default codec package extraction for use in the generated Lucene documentation's link to the index file format documentation Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl Key: LUCENE-5341 URL: https://issues.apache.org/jira/browse/LUCENE-5341 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6, 5.0, 4.7 Attachments: LUCENE-5341.patch In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). Updating this is not automated now - it’s hard-coded in {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every release. The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format documentation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl
[ https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822762#comment-13822762 ] ASF subversion and git services commented on LUCENE-5341: - Commit 1542013 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1542013 ] LUCENE-5341: automate default codec package extraction for use in the generated Lucene documentation's link to the index file format documentation (merged trunk r1542012) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl Key: LUCENE-5341 URL: https://issues.apache.org/jira/browse/LUCENE-5341 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6, 5.0, 4.7 Attachments: LUCENE-5341.patch In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). Updating this is not automated now - it’s hard-coded in {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every release. The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format documentation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: DocValue on Strings slow and OOM
Per, As you are seeing there are different implementations for calculating facets for numeric fields and string fields. The numeric fields I believe are using an int-to-int or long-to-int hashmap to hold the facet counts. This map grows as values are added to it. The String version uses an int array the size of the number of distinct values in the field to hold the facet counts. So if you have a very large number of distinct values in the field, you'll have a very large array. Also the distinct values themselves are held in memory in the fieldCache for string fields. So, basically as you are seeing you'll take up a much larger memory footprint when when faceting on a high cardinality string field, then on a high cardinality numeric field. There are docvalues faceting implementations that will kick-in on a field that has docvalues. You can try setting the on disk flag and this will test memory and performance. Joel Joel On Thu, Nov 14, 2013 at 8:13 AM, Per Steffensen st...@designware.dk wrote: If anyone if following this one, just an update. We are not going to upgrade to 4.5.1 in order to see if the String facet performance problem has been fixed. Instead we have made a few hacks around our data so that we can store the c-field (c_dstr_doc_sto) as long instead (c_dlng_doc_sto). So now we only need to struggle with long-facet performance. There is a performance issue with facets on longs though, but I will tell about in another mailing-thread - need your input on what solution you prefer. Regards, Per Steffensen On 11/6/13 12:15 PM, Per Steffensen wrote: On 11/6/13 11:43 AM, Robert Muir wrote: Before lucene 4.5 docvalues were loaded entirely into RAM. I'm not going to waste time debugging any old code releases here, you should upgrade to the latest release! Ok, thanks! I do not consider it a bug (just a performance issue), so no debugging needed. It is just that we do not want to spend time upgrading to 4.5 if there is not a justified hope/explanation that it will probably make things better. But I guess there is. One short question: Will 4.5 index things differently (compared to 4.4) for documents with fields like I showed earlier? Im basically asking if we need to reindex the 12billion documents again after upgrading to 4.5, or if we ought to be able to deploy 4.5 on top of the already indexed documents. Regards, Per Steffensen -- Joel Bernstein Search Engineer at Heliosearch
[jira] [Commented] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl
[ https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822770#comment-13822770 ] ASF subversion and git services commented on LUCENE-5341: - Commit 1542018 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1542018 ] LUCENE-5341: automate default codec package extraction for use in the generated Lucene documentation's link to the index file format documentation (merged branch_4x r1542013) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl Key: LUCENE-5341 URL: https://issues.apache.org/jira/browse/LUCENE-5341 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6, 5.0, 4.7 Attachments: LUCENE-5341.patch In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). Updating this is not automated now - it’s hard-coded in {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every release. The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format documentation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822778#comment-13822778 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1542025 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1542025 ] LUCENE-5339: current patch Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822780#comment-13822780 ] Michael McCandless commented on LUCENE-5339: OK I created a branch at https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5339 and committed my current patch (same as last patch I think, except I moved the indexedFieldName from FacetField to DimConfig). Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, for all the problems I mentioned. The first revision including all these is 1542030. Steve On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote: -1 Smoke tester passes. Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” follows “Detailed Change List”, but should be above it; and one change attribution didn’t get recognized because it’s missing parens: Elran Dvir via Erick Erickson. Definitely not worth a respin in either case. Lucene Changes look good, except that the “API Changes” section in Changes.html is formatted as an item in the “Bug Fixes” section, rather than its own section. I’ll fix. (The issue is that “API Changes:” in CHANGES.txt has a trailing colon - the section name regex should allow this.) This is probably not worth a respin. Lucene and Solr Documentation pages look good, except that the File Formats” link from the Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). This is respin-worthy. Updating this is not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t change in every release. I’ll try to automate extracting the default from o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]. Lucene and Solr Javadocs look good. Steve On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822776#comment-13822776 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1542023 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1542023 ] LUCENE-5339: make branch Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5341) Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl
[ https://issues.apache.org/jira/browse/LUCENE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-5341. Resolution: Fixed Committed to trunk, branch_4x, and lucene_solr_4_6. Generated documentation should link to the default codec's index format documentation, rather than being hard coded in lucene/site/xsl/index.xsl Key: LUCENE-5341 URL: https://issues.apache.org/jira/browse/LUCENE-5341 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6, 5.0, 4.7 Attachments: LUCENE-5341.patch In the 4.6 RC1, The File Formats” link from the generated Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). Updating this is not automated now - it’s hard-coded in {{lucene/site/xsl/index.xsl}} - the default codec doesn’t change in every release. The default codec could be extracted from {{o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]}} and inserted into the URL to the index file format documentation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5329) Make DocumentDictionary and co more lenient to dirty documents
[ https://issues.apache.org/jira/browse/LUCENE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5329: - Attachment: LUCENE-5329.patch Updated patch: - fixed documentation-lint issue (added queries javadoc in build) - changed # of docs generated in tests to make it a little faster Make DocumentDictionary and co more lenient to dirty documents -- Key: LUCENE-5329 URL: https://issues.apache.org/jira/browse/LUCENE-5329 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Areek Zillur Attachments: LUCENE-5329.patch, LUCENE-5329.patch, LUCENE-5329.patch Currently DocumentDictionary errors out whenever any document does not have value for any relevant stored fields. It would be nice to make it lenient and instead ignore the invalid documents. Another issue with the DocumentDictionary is that it only allows string fields as suggestions and binary fields as payloads. When exposing these dictionaries to solr (via https://issues.apache.org/jira/browse/SOLR-5378), it is inconvenient for the user to ensure that a suggestion field is a string field and a payload field is a binary field. It would be nice to have the dictionary just work whenever a string/binary field is passed to suggestion/payload field. The patch provides one solution to this problem (by accepting string or binary values), though it would be great if there are any other solution to this, without making the DocumentDictionary too flexible -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5442) Python client cannot parse proxied response when served by Tomcat.
Mark Miller created SOLR-5442: - Summary: Python client cannot parse proxied response when served by Tomcat. Key: SOLR-5442 URL: https://issues.apache.org/jira/browse/SOLR-5442 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, 4.7 Seems that propagating the transfer-encoding and connection headers from the proxied response to the real response can cause some http clients (python's so far) to see chunked encoding data as part of the supposedly non chunked response content. The headers are also duplicated in the response. The headers do not get duplicated with Jetty, and python http libs seem to have no problems when getting proxied via Jetty. Testing seems to confirm that not passing on these headers fixes the Tomcat issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
Thanks Steve I won't get to this until next week. I will upload a new RC on monday. Simon Sent from my iPhone On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote: I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, for all the problems I mentioned. The first revision including all these is 1542030. Steve On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote: -1 Smoke tester passes. Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” follows “Detailed Change List”, but should be above it; and one change attribution didn’t get recognized because it’s missing parens: Elran Dvir via Erick Erickson. Definitely not worth a respin in either case. Lucene Changes look good, except that the “API Changes” section in Changes.html is formatted as an item in the “Bug Fixes” section, rather than its own section. I’ll fix. (The issue is that “API Changes:” in CHANGES.txt has a trailing colon - the section name regex should allow this.) This is probably not worth a respin. Lucene and Solr Documentation pages look good, except that the File Formats” link from the Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). This is respin-worthy. Updating this is not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t change in every release. I’ll try to automate extracting the default from o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]. Lucene and Solr Javadocs look good. Steve On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822840#comment-13822840 ] Robert Muir commented on LUCENE-5339: - {quote} Really, you're mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that there are apps that can benefit from a different encoding scheme (e.g. FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better default encoding scheme to introduce abstractions. I mean .. that's just sounds crazy to me. {quote} How common is this, I'm curious? Just to lend my opinion/support to this issue: imo the pure number of classes to the faceting module can be overwhelming. Lets take the encode/decode case: it seems to me you guys iterated a lot and figured out vint was the best default encoding. I'm not going to argue that some use case could benefit from a custom encoding scheme: instead I'm going to argue if it justifies a whole java package with 20 public classes? So I think its fine to bake in the encoding, but with the two key methods in those 20 classes 'protected' in the appropriate places so that an expert user could subclass them: {code} decode(BytesRef buf, IntsRef values); encode(IntsRef values, BytesRef buf); {code} I'd make the argument: if someone is expert enough to do this, they dont need pre-provided concrete encoder/decoder classes anyway, they can write their own method? Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
The PMC Chair is going to marry tomorrow... Simon has to come here and not do new RCs! :) In any case, thanks for doing the release, Simon. I will do the next! Uwe Simon Willnauer simon.willna...@gmail.com schrieb: Thanks Steve I won't get to this until next week. I will upload a new RC on monday. Simon Sent from my iPhone On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote: I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, for all the problems I mentioned. The first revision including all these is 1542030. Steve On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote: -1 Smoke tester passes. Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” follows “Detailed Change List”, but should be above it; and one change attribution didn’t get recognized because it’s missing parens: Elran Dvir via Erick Erickson. Definitely not worth a respin in either case. Lucene Changes look good, except that the “API Changes” section in Changes.html is formatted as an item in the “Bug Fixes” section, rather than its own section. I’ll fix. (The issue is that “API Changes:” in CHANGES.txt has a trailing colon - the section name regex should allow this.) This is probably not worth a respin. Lucene and Solr Documentation pages look good, except that the File Formats” link from the Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). This is respin-worthy. Updating this is not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t change in every release. I’ll try to automate extracting the default from o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]. Lucene and Solr Javadocs look good. Steve On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de
Jenkins build is back to normal : lucene-solr-46-smoker #59
See http://sierranevada.servebeer.com/job/lucene-solr-46-smoker/59/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
I think this calls for a *very* special release. :) Everything best for both of you, Uwe! Dawid On Thu, Nov 14, 2013 at 9:11 PM, Uwe Schindler u...@thetaphi.de wrote: The PMC Chair is going to marry tomorrow... Simon has to come here and not do new RCs! :) In any case, thanks for doing the release, Simon. I will do the next! Uwe Simon Willnauer simon.willna...@gmail.com schrieb: Thanks Steve I won't get to this until next week. I will upload a new RC on monday. Simon Sent from my iPhone On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote: I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, for all the problems I mentioned. The first revision including all these is 1542030. Steve On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote: -1 Smoke tester passes. Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” follows “Detailed Change List”, but should be above it; and one change attr ibution didn’t get recognized because it’s missing parens: Elran Dvir via Erick Erickson. Definitely not worth a respin in either case. Lucene Changes look good, except that the “API Changes” section in Changes.html is formatted as an item in the “Bug Fixes” section, rather than its own section. I’ll fix. (The issue is that “API Changes:” in CHANGES.txt has a trailing colon - the section name regex should allow this. ) This is probably not worth a respin. Lucene and Solr Documentation pages look good, except that the File Formats” link from the Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). This is respin-worthy. Updating this is not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t change in every release. I’ll try to automate extracting the default from o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]. Lucene and Solr Javadocs look good. Steve On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
Congratulations Uwe! On Nov 14, 2013, at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote: The PMC Chair is going to marry tomorrow... Simon has to come here and not do new RCs! :) In any case, thanks for doing the release, Simon. I will do the next! Uwe Simon Willnauer simon.willna...@gmail.com schrieb: Thanks Steve I won't get to this until next week. I will upload a new RC on monday. Simon Sent from my iPhone On 14 Nov 2013, at 20:20, Steve Rowe sar...@gmail.com wrote: I’ve committed fixes, to lucene_solr_4_6 as well as to branch_4x and trunk, for all the problems I mentioned. The first revision including all these is 1542030. Steve On Nov 14, 2013, at 1:16 PM, Steve Rowe sar...@gmail.com wrote: -1 Smoke tester passes. Solr Changes look good, except that the “Upgrading from Solr 4.5.0” section” follows “Detailed Change List”, but should be above it; and one change attribution didn’t get recognized because it’s missing parens: Elran Dvir via Erick Erickson. Definitely not worth a respin in either case. Lucene Changes look good, except that the “API Changes” section in Changes.html is formatted as an item in the “Bug Fixes” section, rather than its own section. I’ll fix. (The issue is that “API Changes:” in CHANGES.txt has a trailing colon - the section name regex should allow this. ) This is probably not worth a respin. Lucene and Solr Documentation pages look good, except that the File Formats” link from the Lucene Documentation page leads to the 4.5 format doc, rather than the 4.6 format doc (Lucene46Codec was introduced by LUCENE-5215). This is respin-worthy. Updating this is not automated now - it’s hard-coded in lucene/site/xsl/index.xsl - the default codec doesn’t change in every release. I’ll try to automate extracting the default from o.a.l.codecs.Codec#defaultCodec [ = Codec.forName(“Lucene46”)]. Lucene and Solr Javadocs look good. Steve On Nov 14, 2013, at 4:37 AM, Simon Willnauer simon.willna...@gmail.com wrote: Please vote for the first Release Candidate for Lucene/Solr 4.6.0 you can download it here: http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 or run the smoke tester directly with this commandline (don't forget to set JAVA6_HOME etc.): python3.2 -u dev-tools/scripts/smokeTestRelease.py http://people.apache.org/~simonw/staging_area/lucene-solr-4.6.0-RC1-rev1541686 1541686 4.6.0 /tmp/smoke_test_4_6 I integrated the RC into Elasticsearch and all tests pass: https://github.com/s1monw/elasticsearch/commit/765e3194bb23f202725bfb28d9a2fd7cc71b49de Smoketester said: SUCCESS! [1:15:57.339272] here is my +1 Simon To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
On Thu, Nov 14, 2013 at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote: The PMC Chair is going to marry tomorrow... Simon has to come here and not do new RCs! :) Congratulations on tying the knot, Uwe!! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5217) disable transitive dependencies in maven config
[ https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-5217. Resolution: Fixed Fix Version/s: (was: 4.6) 4.7 disable transitive dependencies in maven config --- Key: LUCENE-5217 URL: https://issues.apache.org/jira/browse/LUCENE-5217 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Steve Rowe Fix For: 5.0, 4.7 Attachments: LUCENE-5217.patch, LUCENE-5217.patch, LUCENE-5217.patch, LUCENE-5217.patch Our ivy configuration does this: each dependency is specified and so we know what will happen. Unfortunately the maven setup is not configured the same way. Instead the maven setup is configured to download the internet: and it excludes certain things specifically. This is really hard to configure and maintain: we added a 'validate-maven-dependencies' that tries to fail on any extra jars, but all it really does is run a license check after maven runs. It wouldnt find unnecessary dependencies being dragged in if something else in lucene was using them and thus they had a license file. Since maven supports wildcard exclusions: MNG-3832, we can disable this transitive shit completely. We should do this, so its configuration is the exact parallel of ivy. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822855#comment-13822855 ] ASF subversion and git services commented on LUCENE-5339: - Commit 1542062 from [~mikemccand] in branch 'dev/branches/lucene5339' [ https://svn.apache.org/r1542062 ] LUCENE-5339: add abstract Facets base class; fix separate test failure Simplify the facet module APIs -- Key: LUCENE-5339 URL: https://issues.apache.org/jira/browse/LUCENE-5339 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5339.patch, LUCENE-5339.patch I'd like to explore simplifications to the facet module's APIs: I think the current APIs are complex, and the addition of a new feature (sparse faceting, LUCENE-5333) threatens to add even more classes (e.g., FacetRequestBuilder). I think we can do better. So, I've been prototyping some drastic changes; this is very early/exploratory and I'm not sure where it'll wind up but I think the new approach shows promise. The big changes are: * Instead of *FacetRequest/Params/Result, you directly instantiate the classes that do facet counting (currently TaxonomyFacetCounts, RangeFacetCounts or SortedSetDVFacetCounts), passing in the SimpleFacetsCollector, and then you interact with those classes to pull labels + values (topN under a path, sparse, specific labels). * At index time, no more FacetIndexingParams/CategoryListParams; instead, you make a new SimpleFacetFields and pass it the field it should store facets + drill downs under. If you want more than one CLI you create more than one instance of SimpleFacetFields. * I added a simple schema, where you state which dimensions are hierarchical or multi-valued. From this we decide how to index the ordinals (no more OrdinalPolicy). Sparse faceting is just another method (getAllDims), on both taxonomy ssdv facet classes. I haven't created a common base class / interface for all of the search-time facet classes, but I think this may be possible/clean, and perhaps useful for drill sideways. All the new classes are under oal.facet.simple.*. Lots of things that don't work yet: drill sideways, complements, associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-5399: Attachment: SOLR-5399_windows_fix.patch Stopping Jetty before deleting the SolrHome directory fixes the problem in Windows Improve DebugComponent for distributed requests --- Key: SOLR-5399 URL: https://issues.apache.org/jira/browse/SOLR-5399 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Assignee: Ryan Ernst Fix For: 5.0, 4.7 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, SOLR-5399_windows_fix.patch I'm working on extending the DebugComponent for adding some useful information to be able to track distributed requests better. I'm adding two different things, first, the request can generate a request ID that will be printed in the logs for the main query and all the different internal requests to the different shards. This should make it easier to find the different parts of a single user request in the logs. It would also add the purpose of each internal request to the logs, like: RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. Also, I'm adding a track section to the debug info where to add information about the different phases of the distributed request (right now, I'm only including QTime, but could eventually include more information) like: {code:xml} lst name=debug lst name=track lst name=EXECUTE_QUERY str name=localhost:8985/solrQTime: 10/str str name=localhost:8984/solrQTime: 25/str /lst lst name=GET_FIELDS str name=localhost:8985/solrQTime: 1/str /lst /lst /lst {code} To get this, debugQuery must be set to true, or debug must include debug=track. This information is only added to distributed requests. I would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5441) Expose transaction log files number and their size via JMX
[ https://issues.apache.org/jira/browse/SOLR-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-5441: --- Summary: Expose transaction log files number and their size via JMX (was: Expose transaction log files number of their size via JMX) Expose transaction log files number and their size via JMX -- Key: SOLR-5441 URL: https://issues.apache.org/jira/browse/SOLR-5441 Project: Solr Issue Type: Improvement Affects Versions: 4.5 Reporter: Rafał Kuć Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.6, 5.0 Attachments: SOLR-5441-synchronized.patch, SOLR-5441.patch It may be useful to have the number of transaction log files and their overall size exposed via JMX for UpdateHandler. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
[ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822893#comment-13822893 ] Simon Rosenthal commented on SOLR-4722: --- Just one oddity - there are references to the StoredDocument class in getUniqueKeys() which (as far as I can see) is only in trunk - and I'm using Lucene/Solr 4.5.1. I replaced that with Document = which compiles OK, but I haven't had a chance to try it out yet. Do you think it should work ? -Simon On Thursday, November 14, 2013 12:21 PM, Tricia Jenkins (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822636#comment-13822636 ] Tricia Jenkins commented on SOLR-4722: -- Thanks for your interest. This code/jar could be used as is for your purposes. If you don't want to specify highlighting enabled in each query just move it to conf/solrconfig.xml: {code:xml} requestHandler name=standard class=solr.StandardRequestHandler lst name=defaults str name=hltrue/str /lst /requestHandler {code} This highlighter only returns the term positions. The term offsets are stored because they're used by the FastVectorHighlighter. You won't get any useful information from this highlighter if you disable termOffsets in your schema.xml. I just ran this patch against trunk. Still works! -- This message was sent by Atlassian JIRA (v6.1#6144) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled. --- Key: SOLR-4722 URL: https://issues.apache.org/jira/browse/SOLR-4722 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 4.3, 5.0 Reporter: Tricia Jenkins Priority: Minor Attachments: SOLR-4722.patch, solr-positionshighlighter.jar As an alternative to returning snippets, this highlighter provides the (term) position for query matches. One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process. In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query. This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5322) Clean up / simplify Maven-related Ant targets
[ https://issues.apache.org/jira/browse/LUCENE-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved LUCENE-5322. Resolution: Fixed Fix Version/s: (was: 4.6) 4.7 Clean up / simplify Maven-related Ant targets - Key: LUCENE-5322 URL: https://issues.apache.org/jira/browse/LUCENE-5322 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 5.0, 4.7 Attachments: LUCENE-5322.lucene-javadoc-url.fix.patch, LUCENE-5322.patch, LUCENE-5322.validate-maven-artifacts.patch Many Maven-related Ant targets are public when they don't need to be, e.g. dist-maven and filter-pom-templates, m2-deploy-lucene-parent-pom, etc. The arrangement of these targets could be simplified if the directories that have public entry points were minimized. generate-maven-artifacts should be runnable from the top level and from lucene/ and solr/. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #505: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/505/ 4 tests failed. FAILED: org.apache.solr.handler.dataimport.TestContentStreamDataSource.org.apache.solr.handler.dataimport.TestContentStreamDataSource Error Message: 1 thread leaked from SUITE scope at org.apache.solr.handler.dataimport.TestContentStreamDataSource: 1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, group=TGRP-TestContentStreamDataSource] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.DelayQueue.take(DelayQueue.java:189) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.handler.dataimport.TestContentStreamDataSource: 1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, group=TGRP-TestContentStreamDataSource] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.DelayQueue.take(DelayQueue.java:189) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) at __randomizedtesting.SeedInfo.seed([701D3621011975D9]:0) FAILED: org.apache.solr.handler.dataimport.TestContentStreamDataSource.org.apache.solr.handler.dataimport.TestContentStreamDataSource Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, group=TGRP-TestContentStreamDataSource] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.DelayQueue.take(DelayQueue.java:189) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated: 1) Thread[id=253, name=commitScheduler-153-thread-1, state=WAITING, group=TGRP-TestContentStreamDataSource] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.DelayQueue.take(DelayQueue.java:189) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) at __randomizedtesting.SeedInfo.seed([701D3621011975D9]:0) FAILED:
Re: [VOTE] Lucene / Solr 4.6.0
On Nov 14, 2013, at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote: The PMC Chair is going to marry tomorrow... Simon has to come here and not do new RCs! :) +1 :) - Mark
[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
[ https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822917#comment-13822917 ] Stefan Matheis (steffkes) commented on SOLR-5287: - so .. i got it, hopefully. what i'd say we do (in that separate ticket, as you mentioned) is: add a new page called Files (or something like that) which starts with a typical file-tree, as we have it in the Cloud-Section already .. which enables you to browse directories files and view their contents. right now this patch only allows file-uploads (or at least i didn't manage it to accept raw text which i posted to this endpoint)? the code is using input streams .. no idea if that is fileupload-specific? because if we could post the content of a file .. we could offer two choices: # upload a complete file, you have on your disk # change into an edit mode .. and then post the changed file from within your browser which would basically mean you could modify your schema w/o the need to download, modify re-upload it. that would be like we have it already on the Data Import Page .. where you could send a {{dataConfig}} parameter, which then is used instead of the persisted configuration (related code is in the [DataImportHandler.java|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java?view=markup#l129]) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen -- Key: SOLR-5287 URL: https://issues.apache.org/jira/browse/SOLR-5287 Project: Solr Issue Type: Improvement Components: Schema and Analysis, web gui Affects Versions: 4.5, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5287.patch, SOLR-5287.patch, SOLR-5287.patch A user asking a question on the Solr list got me to thinking about editing the main config files from the Solr admin screen. I chatted briefly with [~steffkes] about the mechanics of this on the browser side, he doesn't see a problem on that end. His comment is there's no end point that'll write the file back. Am I missing something here or is this actually not a hard problem? I see a couple of issues off the bat, neither of which seem troublesome. 1 file permissions. I'd imagine lots of installations will get file permission exceptions if Solr tries to write the file out. Well, do a chmod/chown. 2 screwing up the system maliciously or not. I don't think this is an issue, this would be part of the admin handler after all. Does anyone have objections to the idea? And how does this fit into the work that [~sar...@syr.edu] has been doing? I can imagine this extending to SolrCloud with a push this to ZK option or something like that, perhaps not in V1 unless it's easy. Of course any pointers gratefully received. Especially ones that start with Don't waste your effort, it'll never work (or be accepted)... Because what scares me is this seems like such an easy thing to do that would be a significant ease-of-use improvement, so there _has_ to be something I'm missing. So if we go forward with this we'll make this the umbrella JIRA, the two immediate sub-JIRAs that spring to mind will be the UI work and the endpoints for the UI work to use. I think there are only two end-points here 1 list all the files in the conf (or arbitrary from solr_home/collection) directory. 2 write this text to this file Possibly later we could add clone the configs from coreX to coreY. BTW, I've assigned this to myself so I don't lose it, but if anyone wants to take it over it won't hurt my feelings a bit -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.6.0
On Thu, Nov 14, 2013 at 3:11 PM, Uwe Schindler u...@thetaphi.de wrote: The PMC Chair is going to marry tomorrow... Congrats Uwe! -Yonik http://heliosearch.com -- making solr shine - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822920#comment-13822920 ] Robert Muir commented on SOLR-5399: --- Thanks Tomas: I committed this. Improve DebugComponent for distributed requests --- Key: SOLR-5399 URL: https://issues.apache.org/jira/browse/SOLR-5399 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Assignee: Ryan Ernst Fix For: 5.0, 4.7 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, SOLR-5399_windows_fix.patch I'm working on extending the DebugComponent for adding some useful information to be able to track distributed requests better. I'm adding two different things, first, the request can generate a request ID that will be printed in the logs for the main query and all the different internal requests to the different shards. This should make it easier to find the different parts of a single user request in the logs. It would also add the purpose of each internal request to the logs, like: RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. Also, I'm adding a track section to the debug info where to add information about the different phases of the distributed request (right now, I'm only including QTime, but could eventually include more information) like: {code:xml} lst name=debug lst name=track lst name=EXECUTE_QUERY str name=localhost:8985/solrQTime: 10/str str name=localhost:8984/solrQTime: 25/str /lst lst name=GET_FIELDS str name=localhost:8985/solrQTime: 1/str /lst /lst /lst {code} To get this, debugQuery must be set to true, or debug must include debug=track. This information is only added to distributed requests. I would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822921#comment-13822921 ] ASF subversion and git services commented on SOLR-5399: --- Commit 1542082 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1542082 ] SOLR-5399: fix windows test issue Improve DebugComponent for distributed requests --- Key: SOLR-5399 URL: https://issues.apache.org/jira/browse/SOLR-5399 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Assignee: Ryan Ernst Fix For: 5.0, 4.7 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, SOLR-5399_windows_fix.patch I'm working on extending the DebugComponent for adding some useful information to be able to track distributed requests better. I'm adding two different things, first, the request can generate a request ID that will be printed in the logs for the main query and all the different internal requests to the different shards. This should make it easier to find the different parts of a single user request in the logs. It would also add the purpose of each internal request to the logs, like: RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. Also, I'm adding a track section to the debug info where to add information about the different phases of the distributed request (right now, I'm only including QTime, but could eventually include more information) like: {code:xml} lst name=debug lst name=track lst name=EXECUTE_QUERY str name=localhost:8985/solrQTime: 10/str str name=localhost:8984/solrQTime: 25/str /lst lst name=GET_FIELDS str name=localhost:8985/solrQTime: 1/str /lst /lst /lst {code} To get this, debugQuery must be set to true, or debug must include debug=track. This information is only added to distributed requests. I would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5399) Improve DebugComponent for distributed requests
[ https://issues.apache.org/jira/browse/SOLR-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822919#comment-13822919 ] ASF subversion and git services commented on SOLR-5399: --- Commit 1542080 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1542080 ] SOLR-5399: fix windows test issue Improve DebugComponent for distributed requests --- Key: SOLR-5399 URL: https://issues.apache.org/jira/browse/SOLR-5399 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Assignee: Ryan Ernst Fix For: 5.0, 4.7 Attachments: SOLR-5399.patch, SOLR-5399.patch, SOLR-5399.patch, SOLR-5399_windows_fix.patch I'm working on extending the DebugComponent for adding some useful information to be able to track distributed requests better. I'm adding two different things, first, the request can generate a request ID that will be printed in the logs for the main query and all the different internal requests to the different shards. This should make it easier to find the different parts of a single user request in the logs. It would also add the purpose of each internal request to the logs, like: RequestPurpose=GET_FIELDS,GET_DEBUG or RequestPurpose=GET_TOP_IDS. Also, I'm adding a track section to the debug info where to add information about the different phases of the distributed request (right now, I'm only including QTime, but could eventually include more information) like: {code:xml} lst name=debug lst name=track lst name=EXECUTE_QUERY str name=localhost:8985/solrQTime: 10/str str name=localhost:8984/solrQTime: 25/str /lst lst name=GET_FIELDS str name=localhost:8985/solrQTime: 1/str /lst /lst /lst {code} To get this, debugQuery must be set to true, or debug must include debug=track. This information is only added to distributed requests. I would like to get feedback on this. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
[ https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822925#comment-13822925 ] Erick Erickson commented on SOLR-5287: -- It's working for me by specifying a request parameter 'stream.body=put your text here', even from within some tests I'm writing. Does that work for you? I freely admit this is somewhat of a mystery to me. Allow at least solrconfig.xml and schema.xml to be edited via the admin screen -- Key: SOLR-5287 URL: https://issues.apache.org/jira/browse/SOLR-5287 Project: Solr Issue Type: Improvement Components: Schema and Analysis, web gui Affects Versions: 4.5, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5287.patch, SOLR-5287.patch, SOLR-5287.patch A user asking a question on the Solr list got me to thinking about editing the main config files from the Solr admin screen. I chatted briefly with [~steffkes] about the mechanics of this on the browser side, he doesn't see a problem on that end. His comment is there's no end point that'll write the file back. Am I missing something here or is this actually not a hard problem? I see a couple of issues off the bat, neither of which seem troublesome. 1 file permissions. I'd imagine lots of installations will get file permission exceptions if Solr tries to write the file out. Well, do a chmod/chown. 2 screwing up the system maliciously or not. I don't think this is an issue, this would be part of the admin handler after all. Does anyone have objections to the idea? And how does this fit into the work that [~sar...@syr.edu] has been doing? I can imagine this extending to SolrCloud with a push this to ZK option or something like that, perhaps not in V1 unless it's easy. Of course any pointers gratefully received. Especially ones that start with Don't waste your effort, it'll never work (or be accepted)... Because what scares me is this seems like such an easy thing to do that would be a significant ease-of-use improvement, so there _has_ to be something I'm missing. So if we go forward with this we'll make this the umbrella JIRA, the two immediate sub-JIRAs that spring to mind will be the UI work and the endpoints for the UI work to use. I think there are only two end-points here 1 list all the files in the conf (or arbitrary from solr_home/collection) directory. 2 write this text to this file Possibly later we could add clone the configs from coreX to coreY. BTW, I've assigned this to myself so I don't lose it, but if anyone wants to take it over it won't hurt my feelings a bit -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org