[jira] [Created] (SOLR-6094) DIH deletedPkQuery doesn't allows placeholders in query
Ananda Verma created SOLR-6094: -- Summary: DIH deletedPkQuery doesn't allows placeholders in query Key: SOLR-6094 URL: https://issues.apache.org/jira/browse/SOLR-6094 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.3.1 Reporter: Ananda Verma Priority: Blocker Fix For: 4.7.3 When using following deletedPkQuery in data-config.xml {code}deletedPkQuery=SELECT id from ${schema.SCHEMA_NAME}.deleted_users where status = 'ACTIVE'{code} It throws following error {code}20-May-2014 12:35:52 ERROR [org.apache.solr.handler.dataimport.DataImporter.doDeltaImport : 455] :: http-bio-8380-exec-6 :: Delta Import Failed java.lang.AssertionError: Non-leaf nodes should be of type java.util.Map at org.apache.solr.handler.dataimport.VariableResolver.currentLevelMap(VariableResolver.java:235) at org.apache.solr.handler.dataimport.VariableResolver.resolve(VariableResolver.java:94) at org.apache.solr.handler.dataimport.VariableResolver.replaceTokens(VariableResolver.java:155) at org.apache.solr.handler.dataimport.ContextImpl.replaceTokens(ContextImpl.java:254) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:84) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextModifiedRowKey(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:776) at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:764) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:334) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:219) at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:333) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {code} whereas running {code}deletedPkQuery=SELECT id from schema2.deleted_users where status = 'ACTIVE'{code} works well -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5609) Don't let cores create slices/named replicas
[ https://issues.apache.org/jira/browse/SOLR-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5609. -- Resolution: Fixed Fix Version/s: (was: 4.9) 4.8 Assignee: Noble Paul resolved Don't let cores create slices/named replicas Key: SOLR-5609 URL: https://issues.apache.org/jira/browse/SOLR-5609 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Fix For: 4.8, 5.0 Attachments: SOLR-5609.patch, SOLR-5609.patch, SOLR-5609_5130.patch, SOLR-5609_5130.patch, SOLR-5609_5130.patch, SOLR-5609_5130.patch In SolrCloud, it is possible for a core to come up in any node , and register itself with an arbitrary slice/coreNodeName. This is a legacy requirement and we would like to make it only possible for Overseer to initiate creation of slice/replicas We plan to introduce cluster level properties at the top level /cluster-props.json {code:javascript} { noSliceOrReplicaByCores:true } {code} If this property is set to true, cores won't be able to send STATE commands with unknown slice/coreNodeName . Those commands will fail at Overseer. This is useful for SOLR-5310 / SOLR-5311 where a core/replica is deleted by a command and it comes up later and tries to create a replica/slice -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5096) Introduce a new mode to force collection api usage
[ https://issues.apache.org/jira/browse/SOLR-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5096. -- Resolution: Duplicate Introduce a new mode to force collection api usage -- Key: SOLR-5096 URL: https://issues.apache.org/jira/browse/SOLR-5096 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-5096.patch In SOLR-4808, a special mode to disallow bootstrap parameters was discussed so that cluster management is done via collection API always. A tentative name was collectionApiMode. This issue is to find a better name and to implement the mode. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5680) Allow updating multiple DocValues fields atomically
[ https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002928#comment-14002928 ] Shai Erera commented on LUCENE-5680: Well, first updateNumeric/BinaryDV() allows you to unset a value, and I think we should preserve that capability here. As for unsetting, this could be very useful e.g. for a onsale boolean field or the saleprice etc. I think you proposed the unset capability while I was working on LUCENE-5189, but I cannot find the reference :). I agree we shouldn't bloat the API if unnecessary, and when I wrote the {{update(Term,Field...)}} version it looked very simple and tests even passed. So I think this direction is promising, but we should allow unsetting. Perhaps we can put somewhere a constant {{NumericDocValuesField UNSET = ...}}? Maybe it can be on IndexWriter .. the thing is, we don't need it for e.g. Binary, since they take a BytesRef and at least for now, allow passing a null value, but we can have a similar UNSET constant for binary too. Allow updating multiple DocValues fields atomically --- Key: LUCENE-5680 URL: https://issues.apache.org/jira/browse/LUCENE-5680 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5680.patch, LUCENE-5680.patch This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) -- it would be good if we can allow updating several doc-values fields, atomically. It will also improve/simplify our tests, where today we index two fields, e.g. the field itself and a control field. In some multi-threaded tests, since we cannot be sure which updates came through first, we limit the test such that each thread updates a different set of fields, otherwise they will collide and it will be hard to verify the index in the end. I was working on a patch and it looks pretty simple to do, will post a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5680) Allow updating multiple DocValues fields atomically
[ https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002928#comment-14002928 ] Shai Erera edited comment on LUCENE-5680 at 5/20/14 8:32 AM: - Well, first updateNumeric/BinaryDV() allows you to unset a value, and I think we should preserve that capability here. As for unsetting, this could be very useful e.g. for a saleprice field as well as any other field which is transient. I think you proposed the unset capability while I was working on LUCENE-5189, but I cannot find the reference :). I agree we shouldn't bloat the API if unnecessary, and when I wrote the {{update(Term,Field...)}} version it looked very simple and tests even passed. So I think this direction is promising, but we should allow unsetting. Perhaps we can put somewhere a constant {{NumericDocValuesField UNSET = ...}}? Maybe it can be on IndexWriter .. the thing is, we don't need it for e.g. Binary, since they take a BytesRef and at least for now, allow passing a null value, but we can have a similar UNSET constant for binary too. was (Author: shaie): Well, first updateNumeric/BinaryDV() allows you to unset a value, and I think we should preserve that capability here. As for unsetting, this could be very useful e.g. for a onsale boolean field or the saleprice etc. I think you proposed the unset capability while I was working on LUCENE-5189, but I cannot find the reference :). I agree we shouldn't bloat the API if unnecessary, and when I wrote the {{update(Term,Field...)}} version it looked very simple and tests even passed. So I think this direction is promising, but we should allow unsetting. Perhaps we can put somewhere a constant {{NumericDocValuesField UNSET = ...}}? Maybe it can be on IndexWriter .. the thing is, we don't need it for e.g. Binary, since they take a BytesRef and at least for now, allow passing a null value, but we can have a similar UNSET constant for binary too. Allow updating multiple DocValues fields atomically --- Key: LUCENE-5680 URL: https://issues.apache.org/jira/browse/LUCENE-5680 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5680.patch, LUCENE-5680.patch This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) -- it would be good if we can allow updating several doc-values fields, atomically. It will also improve/simplify our tests, where today we index two fields, e.g. the field itself and a control field. In some multi-threaded tests, since we cannot be sure which updates came through first, we limit the test such that each thread updates a different set of fields, otherwise they will collide and it will be hard to verify the index in the end. I was working on a patch and it looks pretty simple to do, will post a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers
[ https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002963#comment-14002963 ] Michael McCandless commented on LUCENE-5618: +1, thanks Shai. DocValues updates send wrong fieldinfos to codec producers -- Key: LUCENE-5618 URL: https://issues.apache.org/jira/browse/LUCENE-5618 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Shai Erera Priority: Blocker Fix For: 4.9 Attachments: LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch Spinoff from LUCENE-5616. See the example there, docvalues readers get a fieldinfos, but it doesn't contain the correct ones, so they have invalid field numbers at read time. This should really be fixed. Maybe a simple solution is to not write batches of fields in updates but just have only one field per gen? This removes many-many relationships and would make things easy to understand. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5679: --- Attachment: LUCENE-5679.patch Patch removes the single-arg delDocs Term/Query variants. Everything compiles and tests pass. I'll sun jdocs tests too, though eclipse showed no additional errors about referencing those methods. Do you think it's OK to backport to 4x? The only concern I have is if apps will upgrade to e.g. 4.9 by only dropping the 4.9 jar, not compiling their code as well. I don't know if people still do that though ... :). Anyway, if we want to keep that, we can deprecate them in 4x. Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6095) SolrCloud cluster can end up without an overseer
Shalin Shekhar Mangar created SOLR-6095: --- Summary: SolrCloud cluster can end up without an overseer Key: SOLR-6095 URL: https://issues.apache.org/jira/browse/SOLR-6095 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8 Reporter: Shalin Shekhar Mangar Fix For: 4.9, 5.0 We have a large cluster running on ec2 which occasionally ends up without an overseer after a rolling restart. We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly. This cluster is running a custom branch forked from 4.8 and has SOLR-5473, SOLR-5495 and SOLR-5468 applied. We have a large number of small collections (120 collections each with approx 5M docs) on 16 Solr nodes. We are also using the overseer roles feature to designate two specified nodes as overseers. However, I think the problem that we're seeing is not specific to the overseer roles feature. As soon as the overseer was shutdown, we saw the following on the node which was next in line to become the overseer: {code} 2014-05-20 09:55:39,261 [main-EventThread] INFO solr.cloud.ElectionContext - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr 2014-05-20 09:55:39,265 [main-EventThread] WARN solr.cloud.LeaderElector - org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /overseer_elect/leader at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373) at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110) at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55) at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} When the overseer leader node is gracefully shutdown, we get the following in the logs: {code} 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer - Exception in Overseer main queue loop org.apache.solr.common.SolrException: Could not load collection from ZK:sm12 at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246) at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223) at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767) ... 4 more 2014-05-20 09:55:39,254 [Thread-63] INFO solr.cloud.Overseer - Overseer Loop exiting : ec2-xx.compute-1.amazonaws.com:8986_solr 2014-05-20 09:55:39,256 [main-EventThread] WARN common.cloud.ZkStateReader - ZooKeeper watch triggered, but Solr cannot talk to ZK 2014-05-20 09:55:39,259 [ShutdownMonitor] INFO server.handler.ContextHandler - stopped o.e.j.w.WebAppContext{/solr,file:/vol0/cloud86/solr-webapp/webapp/},/vol0/cloud86/webapps/solr.war {code} Notice how the overseer kept on running almost till the last point i.e. until the jetty context stopped. On some runs, we got the following on the overseer leader node
[jira] [Updated] (SOLR-6085) Suggester crashes
[ https://issues.apache.org/jira/browse/SOLR-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Ferrández updated SOLR-6085: -- Affects Version/s: 4.8 Suggester crashes - Key: SOLR-6085 URL: https://issues.apache.org/jira/browse/SOLR-6085 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.7.1, 4.8 Reporter: Jorge Ferrández Fix For: 4.7.3, 4.8.1, 4.9, 5.0 AnalyzingInfixSuggester class fails when is queried with a ß character (ezsett) used in German, but it doesn't happen for all data or for all words containing this character. The exception reported is the following: {code:java} response lst name=responseHeader int name=status500/int int name=QTime18/int /lst lst name=error str name=msgString index out of range: 5/str str name=trace java.lang.StringIndexOutOfBoundsException: String index out of range: 5 at java.lang.String.substring(String.java:1907) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.addPrefixMatch(AnalyzingInfixSuggester.java:575) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.highlight(AnalyzingInfixSuggester.java:525) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.createResults(AnalyzingInfixSuggester.java:479) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:437) at org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.lookup(AnalyzingInfixSuggester.java:338) at org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:181) at org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:232) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:217) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) /str int name=code500/int /lst /response {code} With this query http://localhost:8983/solr/suggest_de?suggest.q=gieß (for gießen, which is actually in the data) The problem seems to be that we use ASCIIFolding to unify ss and ß, which are both valid alternatives in German. Looking at
[jira] [Commented] (SOLR-6095) SolrCloud cluster can end up without an overseer
[ https://issues.apache.org/jira/browse/SOLR-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003046#comment-14003046 ] Shalin Shekhar Mangar commented on SOLR-6095: - I also opened SOLR-6091 but that didn't help. SolrCloud cluster can end up without an overseer Key: SOLR-6095 URL: https://issues.apache.org/jira/browse/SOLR-6095 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8 Reporter: Shalin Shekhar Mangar Fix For: 4.9, 5.0 We have a large cluster running on ec2 which occasionally ends up without an overseer after a rolling restart. We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly. This cluster is running a custom branch forked from 4.8 and has SOLR-5473, SOLR-5495 and SOLR-5468 applied. We have a large number of small collections (120 collections each with approx 5M docs) on 16 Solr nodes. We are also using the overseer roles feature to designate two specified nodes as overseers. However, I think the problem that we're seeing is not specific to the overseer roles feature. As soon as the overseer was shutdown, we saw the following on the node which was next in line to become the overseer: {code} 2014-05-20 09:55:39,261 [main-EventThread] INFO solr.cloud.ElectionContext - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr 2014-05-20 09:55:39,265 [main-EventThread] WARN solr.cloud.LeaderElector - org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /overseer_elect/leader at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373) at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110) at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55) at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} When the overseer leader node is gracefully shutdown, we get the following in the logs: {code} 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer - Exception in Overseer main queue loop org.apache.solr.common.SolrException: Could not load collection from ZK:sm12 at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246) at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223) at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767) ... 4 more 2014-05-20 09:55:39,254 [Thread-63] INFO solr.cloud.Overseer - Overseer Loop exiting : ec2-xx.compute-1.amazonaws.com:8986_solr 2014-05-20 09:55:39,256 [main-EventThread] WARN common.cloud.ZkStateReader - ZooKeeper watch triggered, but Solr cannot talk to ZK 2014-05-20 09:55:39,259 [ShutdownMonitor] INFO server.handler.ContextHandler - stopped
[jira] [Resolved] (LUCENE-5154) ban tests from writing to CWD
[ https://issues.apache.org/jira/browse/LUCENE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-5154. - Resolution: Duplicate Incorporated into LUCENE-5650 ban tests from writing to CWD - Key: LUCENE-5154 URL: https://issues.apache.org/jira/browse/LUCENE-5154 Project: Lucene - Core Issue Type: Test Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-5154.patch Currently each forked jvm has cwd = tempDir = . This provides some minimal protection against tests in different jvms from interfering with each other, but we can do much better by splitting these concerns: and setting cwd = . and tempDir = ./temp Tests that write files to CWD can confuse IDE users because they can create dirty checkouts or other issues between different runs, and of course can interfere with other tests in the *same* jvm (there are other possible ways to do this to). So a test like this should fail with SecurityException, but currently does not. {code} public void testBogus() throws Exception { File file = new File(foo.txt); FileOutputStream os = new FileOutputStream(file); os.write(1); os.close(); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5154) ban tests from writing to CWD
[ https://issues.apache.org/jira/browse/LUCENE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-5154: --- Assignee: Dawid Weiss ban tests from writing to CWD - Key: LUCENE-5154 URL: https://issues.apache.org/jira/browse/LUCENE-5154 Project: Lucene - Core Issue Type: Test Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-5154.patch Currently each forked jvm has cwd = tempDir = . This provides some minimal protection against tests in different jvms from interfering with each other, but we can do much better by splitting these concerns: and setting cwd = . and tempDir = ./temp Tests that write files to CWD can confuse IDE users because they can create dirty checkouts or other issues between different runs, and of course can interfere with other tests in the *same* jvm (there are other possible ways to do this to). So a test like this should fail with SecurityException, but currently does not. {code} public void testBogus() throws Exception { File file = new File(foo.txt); FileOutputStream os = new FileOutputStream(file); os.write(1); os.close(); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003071#comment-14003071 ] Dawid Weiss commented on LUCENE-5650: - I handled some of the velocity and DIH. Running the tests now. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Survey on Project Conventions
Hello My name is Martin Brandtner [1] and I’m a software engineering researcher at the University of Zurich, Switzerland. Together with Philipp Leitner [2], I currently work on an approach to detect violations of project conventions based on data from the source code repository, the issue tracker (e.g. Jira), and the build system (e.g. Jenkins). One example for such a project convention is: “You need to make sure that the commit message contains at least the name of the contributor and ideally a reference to the Bugzilla or JIRA issue where the patch was submitted.” [3] The idea is that our approach can detect violation of such a convention automatically and therefore support the development process. First of all we need conventions and that’s why we ask you to take part in our survey. In the survey, we present five conventions and want you to rate their relevance in your Apache project. Everybody contributing to your Apache project can take part in this survey because we also want to see if different roles may have different opinions about a convention. The survey is totally anonymous and it will take about 15 minutes to answer it. We would be happy if you could fill out our survey under: http://ww3.unipark.de/uc/SEAL_Research/1abe/ before May 30, 2014. With the data collected in this survey we will implement a convention violation detection in our tool called SQA-Timeline [4]. If your are interested in our work, contact us via email or provide your email address in the survey. Best regards, Martin and Philipp [1] http://www.ifi.uzh.ch/seal/people/brandtner.html [2] http://www.ifi.uzh.ch/seal/people/leitner.html [3] http://www.apache.org/dev/committers.html#applying-patches [4] https://www.youtube.com/watch?v=ZIsOODUapAE
[ANNOUNCE] Apache Lucene 4.8.1 released
May 2014, Apache Lucene™ 4.8.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html Lucene 4.8.1 includes 15 bug fixes. See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Solr 4.8.1 released
May 2014, Apache Solr™ 4.8.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.8.1 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Solr 4.8.1 includes 10 bug fixes, as well as Lucene 4.8.1 and its bug fixes. See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6096) Support Update and Delete on nested documents
Thomas Scheffler created SOLR-6096: -- Summary: Support Update and Delete on nested documents Key: SOLR-6096 URL: https://issues.apache.org/jira/browse/SOLR-6096 Project: Solr Issue Type: Improvement Affects Versions: 4.7.2 Reporter: Thomas Scheffler When using nested or child document. Update and delete operation on the root document should also affect the nested documents, as no child can exist without its parent :-) Example {code:xml|title=First Import} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, John/field field name=roleauthor/field /doc /doc {code} If I change my mind and the author was not named *John* but *_Jane_*: {code:xml|title=Changed name of author of '1'} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, Jane/field field name=roleauthor/field /doc /doc {code} I would expect that John is not in the index anymore. Currently he is. There might also be the case that any subdocument is removed by an update: {code:xml|title=Remove author} doc field name=id1/field field name=titleArticle without author/field /doc {code} This should affect a delete on all nested documents, too. The same way all nested documents should be deleted if I delete the root document: {code:xml|title=Deletion of '1'} delete id1/id !-- implying also query_root_:1/query -- /delete {code} This is currently possible to do all this stuff on client side by issuing additional request to delete document before every update. It would be more efficient if this could be handled on SOLR side. One would benefit on atomic update. The biggest plus shows when using delete-by-query. {code:xml|title=Deletion of '1' by query} delete querytitle:*/query !-- implying also query_root_:1/query -- /delete {code} In that case one would not have to first query all documents and issue deletes by those id and every document that are nested. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5680) Allow updating multiple DocValues fields atomically
[ https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003162#comment-14003162 ] Robert Muir commented on LUCENE-5680: - I don't think i proposed unsetting :) I don't see how it provides anything over an explicit 'onsale' and 'saleprice' that you update atomically here, which will work easier in general for the search APIs (ranking etc) So I really don't think this should drive the API at all, I'm still not seeing a use case. Allow updating multiple DocValues fields atomically --- Key: LUCENE-5680 URL: https://issues.apache.org/jira/browse/LUCENE-5680 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5680.patch, LUCENE-5680.patch This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) -- it would be good if we can allow updating several doc-values fields, atomically. It will also improve/simplify our tests, where today we index two fields, e.g. the field itself and a control field. In some multi-threaded tests, since we cannot be sure which updates came through first, we limit the test such that each thread updates a different set of fields, otherwise they will collide and it will be hard to verify the index in the end. I was working on a patch and it looks pretty simple to do, will post a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #632: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/632/ 1 tests failed. FAILED: org.apache.solr.cloud.HttpPartitionTest.testDistribSearch Error Message: No registered leader was found after waiting for 6ms , collection: c8n_1x3_lf slice: shard1 Stack Trace: org.apache.solr.common.SolrException: No registered leader was found after waiting for 6ms , collection: c8n_1x3_lf slice: shard1 at __randomizedtesting.SeedInfo.seed([CBB469512D717EC6:4A52E7495A2E1EFA]:0) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:545) at org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:348) at org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:148) Build Log: [...truncated 54738 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:490: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:182: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/extra-targets.xml:77: Java returned: 1 Total time: 151 minutes 35 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5468) Option to enforce a majority quorum approach to accepting updates in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003206#comment-14003206 ] ASF subversion and git services commented on SOLR-5468: --- Commit 1596234 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1596234 ] SOLR-5468: Add wait loop to see replicas become active after restoring partitions; to address intermittent Jenkins test failures. Option to enforce a majority quorum approach to accepting updates in SolrCloud -- Key: SOLR-5468 URL: https://issues.apache.org/jira/browse/SOLR-5468 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.5 Environment: All Reporter: Timothy Potter Assignee: Timothy Potter Priority: Minor Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch I've been thinking about how SolrCloud deals with write-availability using in-sync replica sets, in which writes will continue to be accepted so long as there is at least one healthy node per shard. For a little background (and to verify my understanding of the process is correct), SolrCloud only considers active/healthy replicas when acknowledging a write. Specifically, when a shard leader accepts an update request, it forwards the request to all active/healthy replicas and only considers the write successful if all active/healthy replicas ack the write. Any down / gone replicas are not considered and will sync up with the leader when they come back online using peer sync or snapshot replication. For instance, if a shard has 3 nodes, A, B, C with A being the current leader, then writes to the shard will continue to succeed even if B C are down. The issue is that if a shard leader continues to accept updates even if it loses all of its replicas, then we have acknowledged updates on only 1 node. If that node, call it A, then fails and one of the previous replicas, call it B, comes back online before A does, then any writes that A accepted while the other replicas were offline are at risk to being lost. SolrCloud does provide a safe-guard mechanism for this problem with the leaderVoteWait setting, which puts any replicas that come back online before node A into a temporary wait state. If A comes back online within the wait period, then all is well as it will become the leader again and no writes will be lost. As a side note, sys admins definitely need to be made more aware of this situation as when I first encountered it in my cluster, I had no idea what it meant. My question is whether we want to consider an approach where SolrCloud will not accept writes unless there is a majority of replicas available to accept the write? For my example, under this approach, we wouldn't accept writes if both BC failed, but would if only C did, leaving A B online. Admittedly, this lowers the write-availability of the system, so may be something that should be tunable? From Mark M: Yeah, this is kind of like one of many little features that we have just not gotten to yet. I’ve always planned for a param that let’s you say how many replicas an update must be verified on before responding success. Seems to make sense to fail that type of request early if you notice there are not enough replicas up to satisfy the param to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
Varun Thacker created LUCENE-5688: - Summary: NumericDocValues fields with sparse data can be compressed better Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-5688: -- Attachment: LUCENE-5688.patch Here is a quick patch. Wanted to get some feedback on the approach. When I run the showIndexBloat method without the SPARSE_COMPRESSED changes, this is the size of the docValues data - {noformat} -rw-r--r-- 1 varun wheel 9.9M May 20 18:28 _a_Lucene45_0.dvd -rw-r--r-- 1 varun wheel 312B May 20 18:28 _a_Lucene45_0.dvm {noformat} With the SPARSE_COMPRESSED changes {noformat} -rw-r--r-- 1 varun wheel 2.7M May 20 18:51 _a_Lucene45_0.dvd -rw-r--r-- 1 varun wheel 352B May 20 18:51 _a_Lucene45_0.dvm {noformat} NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003220#comment-14003220 ] Robert Muir commented on LUCENE-5688: - I think this is a duplicate of LUCENE-4921 ? I guess the main thing is to differentiate between sparse data and thousands and thousands of fields which usually hints at the problem not being in lucene :) NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003226#comment-14003226 ] Robert Muir commented on LUCENE-5688: - Varun, i dont think we should make a long[] of size maxDoc in ram here just to save some space on disk. NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003231#comment-14003231 ] Grant Ingersoll commented on LUCENE-5688: - bq. Varun, i dont think we should make a long[] of size maxDoc in ram here just to save some space on disk. In a large index, this can be quite significant, FWIW. Agreed on the long[] in RAM, but would be good to have a better way of controlling the on-disk behavior. NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003234#comment-14003234 ] Varun Thacker commented on LUCENE-5688: --- Completely overlooked LUCENE-4921. Should I mark this as duplicate and post the same patch there? bq. Varun, i dont think we should make a long[] of size maxDoc in ram here just to save some space on disk. I felt the same way when I was writing it, but that was the easiest way to get a quick patch out. I will try to think of a better way to achieve this. Do you have any suggestions? NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003321#comment-14003321 ] Robert Muir commented on LUCENE-5688: - You otherwise don't load hardly anything in RAM, so its extremely trappy to do this. As i mentioned, the obvious approach is O(log N), like android's SparseArray. so array 1 is increasing documents that have value (can be a monotonicblockreader). you can binarysearch that to find your value in the real values. You have to decide how 'missing' should be represented. currently it will be 1 bit per document as well. if it stays that way, you can check that first (which is the typical case) before binary searching. In all cases this has performance implications (slower access), and isn't specific to numerics (all dv fields could be sparse). So I think its best to start outside of the default codec rather than trying to do it automatically. Not everyone will want the space-time tradeoff. NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6093) delete docs only in a spec shard within a collection
[ https://issues.apache.org/jira/browse/SOLR-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003332#comment-14003332 ] Erick Erickson commented on SOLR-6093: -- Hmmm.. Couple of questions to add to the mix: 1 How to handle multiple shards hosted on the same node? 2 How does adding distrib=false interact with this idea? delete docs only in a spec shard within a collection Key: SOLR-6093 URL: https://issues.apache.org/jira/browse/SOLR-6093 Project: Solr Issue Type: Improvement Components: contrib - Clustering Affects Versions: 4.6, 4.7 Reporter: YouPeng Yang Priority: Minor Suppose to delete docs in a spec shard within a collection,using this the command [1]http://localhost:8082/solr/tv_201402/update?stream.body=deletequeryBEGINTIME:[2014-03-01 00:00:00 TO *]/query/deleteshards=tv_201402commit=true. As a result, all the docs over the collection which BEGINTIME is larger than 2014-03-01 00:00:00 were deleted. After I check the source code :DistributedUpdateProcessor.doDeleteByQuery.It shows that the _route_ and shard.keys can make work, which indeed works after I have try with the cmd: [2]http://10.1.22.1:8082/solr/tv_201402/update?stream.body=deletequeryBEGINTIME:[2014-03-01 00:00:00 TO 2014-03-02 00:00:00]/query/delete_route_=tv_201402commit=true As the first request[1],I use the shards parameter hoping to delete docs only in the tv_201402, while the second request[1],it change to _route_ parameter . The purpose that I file the JIRA is that we should make it to be consist that the shards parameter should also can make request no distrib during updating just as some as searching. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003377#comment-14003377 ] Michael McCandless commented on LUCENE-5679: I think just drop them in 4.x. Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
Robert Muir created LUCENE-5689: --- Summary: FieldInfo.setDocValuesGen should not be public. Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
[ https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003425#comment-14003425 ] Noble Paul commented on SOLR-5681: -- The REQUESTSTATUS would not tell me what was the output of the run? So, I miss that when async is used? Make the OverseerCollectionProcessor multi-threaded --- Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6055) TestMiniSolrCloudCluster has data dir in test's CWD
[ https://issues.apache.org/jira/browse/SOLR-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst resolved SOLR-6055. -- Resolution: Duplicate Rolling back into LUCENE-5650. TestMiniSolrCloudCluster has data dir in test's CWD --- Key: SOLR-6055 URL: https://issues.apache.org/jira/browse/SOLR-6055 Project: Solr Issue Type: Bug Reporter: Ryan Ernst While investigating one of the test failures created when tightening test permissions to restrict write access to CWD (see LUCENE-5650), I've found {{TestMiniSolrCloudCluster}} is attemping to write transaction logs to {{$CWD/data/tlog}}. I've traced this down to two things which are happening: # The test uses {{RAMDirectoryFactory}}, which always returns true for {{isAbsolute}}. This causes the directory factory to *not* adjust the default relative to bring it under the instance dir. # The {{UpdateLog}} creates its tlog file with the relative data dir. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.8-Linux (64bit/jdk1.8.0_05) - Build # 205 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.8-Linux/205/ Java: 64bit/jdk1.8.0_05 -XX:+UseCompressedOops -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings Error Message: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=74,endOffset=7 Stack Trace: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=74,endOffset=7 at __randomizedtesting.SeedInfo.seed([D784772204AB6B46:BDDFC8335DE54BB5]:0) at org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45) at org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter.unhyphenate(HyphenatedWordsFilter.java:144) at org.apache.lucene.analysis.miscellaneous.HyphenatedWordsFilter.incrementToken(HyphenatedWordsFilter.java:97) at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:701) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:612) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:511) at org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:944) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Updated] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst updated LUCENE-5650: --- Attachment: dih.patch Looks like we overlapped on fixing these. I like how you handle velocity better than me (I hacked through a way to set the log file). But I'm not sure I like the DIH change. I think it is bogus to default to the CWD if running without a core (which seems to only happen in tests?). I changed this to default to solr.solr.home, then set this to a temp dir in the abstract DIH test base (see attached patch). createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003500#comment-14003500 ] ASF subversion and git services commented on LUCENE-5679: - Commit 1596296 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1596296 ] LUCENE-5679: remove the single-parameter deleteDocuments() versions, in favor of the vararg ones Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5689: Attachment: LUCENE-5689.patch here's a patch. also adds some safety to checkConsistency. We should probably add assert checkConsistency() to the package-private setter. FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
[ https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003518#comment-14003518 ] Anshum Gupta commented on SOLR-5681: I had created SOLR-5886 for that. Will take that one up soon. Make the OverseerCollectionProcessor multi-threaded --- Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5680) Allow updating multiple DocValues fields atomically
[ https://issues.apache.org/jira/browse/LUCENE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5680: --- Attachment: LUCENE-5680.patch Patch contains the {{Field...}} vs {{DocValuesUpdate...}} variants side-by-side, with support for unsetting a field's value. To support that with the {{Field...}} method, I added {{Field.setNumberValue(Number value)}} and a {{.createMissing}} static method to NumericDocValuesField. This now works. Now we have them side-by-side in a patch to review. Allow updating multiple DocValues fields atomically --- Key: LUCENE-5680 URL: https://issues.apache.org/jira/browse/LUCENE-5680 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5680.patch, LUCENE-5680.patch, LUCENE-5680.patch This has come up on the list (http://markmail.org/message/2wmpvksuwc5t57pg) -- it would be good if we can allow updating several doc-values fields, atomically. It will also improve/simplify our tests, where today we index two fields, e.g. the field itself and a control field. In some multi-threaded tests, since we cannot be sure which updates came through first, we limit the test such that each thread updates a different set of fields, otherwise they will collide and it will be hard to verify the index in the end. I was working on a patch and it looks pretty simple to do, will post a patch shortly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003556#comment-14003556 ] Shai Erera commented on LUCENE-5689: Looks good. So now FI.setDVGen is used only by ReaderAndUpdates, right? Perhaps it is possible to get rid of it entirely, by having RAU create a new FI instance, setting the new dvGen? FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5679. Resolution: Fixed Fix Version/s: 5.0 4.9 Assignee: Shai Erera Committed to trunk and 4x. Thanks Mike! Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.9, 5.0 Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003530#comment-14003530 ] ASF subversion and git services commented on LUCENE-5679: - Commit 1596301 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596301 ] LUCENE-5679: remove the single-parameter deleteDocuments() versions, in favor of the vararg ones Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6097) Posting JSON with results in lost information
Kingston Duffie created SOLR-6097: - Summary: Posting JSON with results in lost information Key: SOLR-6097 URL: https://issues.apache.org/jira/browse/SOLR-6097 Project: Solr Issue Type: Bug Affects Versions: 4.7.2 Reporter: Kingston Duffie Post the following JSON to add a document: { add : { commitWithin : 5000, doc : { id : 12345, body : a b c } } } The body field is configured in the schema as: field name=body type=text_hive indexed=true stored=true required=false multiValued=false/ and fieldType name=text_hive class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The problem is this: After submitting this post, if you go to the SOLR console and find this document, the stored body will be missing the contents between the less-than and greater-than symbols -- i.e., a c. If you encode the body (i.e., a lt; b gt; c), it will show up with and symbols. That is, it appears that SOLR is stripping out HTML tags even though we are not asking it to. Note that it is not only the storage but also indexing that is affected (as we originally found the issue because searching for b would not match this document. I'm willing to believe that I'm doing something wrong, but I can't see anywhere in any spec that suggests that strings inside JSON need to be -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003563#comment-14003563 ] ASF subversion and git services commented on LUCENE-5679: - Commit 1596304 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1596304 ] LUCENE-5679: leftover jdoc fix Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.9, 5.0 Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003564#comment-14003564 ] ASF subversion and git services commented on LUCENE-5679: - Commit 1596306 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596306 ] LUCENE-5679: leftover jdoc fix Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.9, 5.0 Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1596301 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/CHANGES.txt lucene/core/ lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
Hi, this is a binary backwards break in 4.x, because the method signature, user's code was compiled against in previous versions, is removed for no reason. In 4.x I would keep the one-arg methods, but just let it delegate to the vararg version. The javadocs can stay the same. In fact this change requires to recompile your source-code (source-code compatibility is ensured) but does not provide binary compatibility. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: sh...@apache.org [mailto:sh...@apache.org] Sent: Tuesday, May 20, 2014 6:02 PM To: comm...@lucene.apache.org Subject: svn commit: r1596301 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/CHANGES.txt lucene/core/ lucene/core/src/java/org/apache/lucene/index/IndexWriter.java Author: shaie Date: Tue May 20 16:02:17 2014 New Revision: 1596301 URL: http://svn.apache.org/r1596301 Log: LUCENE-5679: remove the single-parameter deleteDocuments() versions, in favor of the vararg ones Modified: lucene/dev/branches/branch_4x/ (props changed) lucene/dev/branches/branch_4x/lucene/ (props changed) lucene/dev/branches/branch_4x/lucene/CHANGES.txt (contents, props changed) lucene/dev/branches/branch_4x/lucene/core/ (props changed) lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/i ndex/IndexWriter.java Modified: lucene/dev/branches/branch_4x/lucene/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/CHA NGES.txt?rev=1596301r1=1596300r2=1596301view=diff == --- lucene/dev/branches/branch_4x/lucene/CHANGES.txt (original) +++ lucene/dev/branches/branch_4x/lucene/CHANGES.txt Tue May 20 16:02:17 +++ 2014 @@ -61,6 +61,10 @@ API Changes * LUCENE-5640: The Token class was deprecated. Since Lucene 2.9, TokenStreams are using Attributes, Token is no longer used. (Uwe Schindler, Robert Muir) +* LUCENE-5679: Consolidated IndexWriter.deleteDocuments(Term) and + IndexWriter.deleteDocuments(Query) with their varargs counterparts. + (Shai Erera) + Optimizations * LUCENE-5603: hunspell stemmer more efficiently strips prefixes Modified: lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/i ndex/IndexWriter.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/cor e/src/java/org/apache/lucene/index/IndexWriter.java?rev=1596301r1=159 6300r2=1596301view=diff == --- lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/i ndex/IndexWriter.java (original) +++ lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene +++ /index/IndexWriter.java Tue May 20 16:02:17 2014 @@ -77,8 +77,8 @@ import org.apache.lucene.util.ThreadInte and otherwise open the existing index./p pIn either case, documents are added with {@link #addDocument(Iterable) - addDocument} and removed with {@link #deleteDocuments(Term)} or {@link - #deleteDocuments(Query)}. A document can be updated with {@link + addDocument} and removed with {@link #deleteDocuments(Term...)} or + {@link #deleteDocuments(Query...)}. A document can be updated with + {@link #updateDocument(Term, Iterable) updateDocument} (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, {@link #close() close} should be called./p @@ - 1323,28 +1323,6 @@ public class IndexWriter implements Clos } } - /** - * Deletes the document(s) containing codeterm/code. - * - * pbNOTE/b: if this method hits an OutOfMemoryError - * you should immediately close the writer. See a - * href=#OOMEabove/a for details./p - * - * @param term the term to identify the documents to be deleted - * @throws CorruptIndexException if the index is corrupt - * @throws IOException if there is a low-level IO error - */ - public void deleteDocuments(Term term) throws IOException { -ensureOpen(); -try { - if (docWriter.deleteTerms(term)) { -processEvents(true, false); - } -} catch (OutOfMemoryError oom) { - handleOOM(oom, deleteDocuments(Term)); -} - } - /** Expert: attempts to delete by document ID, as long as * the provided reader is a near-real-time reader (from {@link * DirectoryReader#open(IndexWriter,boolean)}). If the @@ -1357,8 +1335,7 @@ public class IndexWriter implements Clos * bNOTE/b: this method can only delete documents * visible to the currently open NRT reader. If you need * to delete documents indexed after opening the NRT - * reader you must use the other deleteDocument methods - * (e.g., {@link #deleteDocuments(Term)}). */ + * reader you must use {@link
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003574#comment-14003574 ] Uwe Schindler commented on LUCENE-5679: --- See my comment on the mailing list about 4.x: {quote} Hi, this is a binary backwards break in 4.x, because the method signature, user's code was compiled against in previous versions, is removed for no reason. In 4.x I would keep the one-arg methods, but just let it delegate to the vararg version. The javadocs can stay the same. In fact this change requires to recompile your source-code (source-code compatibility is ensured) but does not provide binary compatibility. Uwe {quote} Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.9, 5.0 Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5886) Propagate more information in case of failed async tasks
[ https://issues.apache.org/jira/browse/SOLR-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003577#comment-14003577 ] Noble Paul commented on SOLR-5886: -- not just errors. All the output of the non async command should be available for async as well Propagate more information in case of failed async tasks Key: SOLR-5886 URL: https://issues.apache.org/jira/browse/SOLR-5886 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta As of now, only the state of a pre-submitted tasks is returned in the response to the REQUESTSTATUS Collections API call. Pass more information, specially in case of a call erroring out. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5886) Propagate more information in case of failed async tasks
[ https://issues.apache.org/jira/browse/SOLR-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003579#comment-14003579 ] Anshum Gupta commented on SOLR-5886: sure, makes sense. Propagate more information in case of failed async tasks Key: SOLR-5886 URL: https://issues.apache.org/jira/browse/SOLR-5886 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta As of now, only the state of a pre-submitted tasks is returned in the response to the REQUESTSTATUS Collections API call. Pass more information, specially in case of a call erroring out. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003580#comment-14003580 ] Robert Muir commented on LUCENE-5689: - It maybe, i wasnt sure about the implications of that. I think we should first remove the 'public', because I do not know what will happen if someone invokes this setter on e.g. AtomicReader, but I'm guessing its not good :) FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6095) SolrCloud cluster can end up without an overseer
[ https://issues.apache.org/jira/browse/SOLR-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003585#comment-14003585 ] Shalin Shekhar Mangar commented on SOLR-6095: - The problem that I could find is in LeaderElector.checkIfIamLeader where we have the following code: {code} if (seq = intSeqs.get(0)) { // first we delete the node advertising the old leader in case the ephem is still there try { zkClient.delete(context.leaderPath, -1, true); } catch(Exception e) { // fine } runIamLeaderProcess(context, replacement); } {code} If for whatever reason, the zkClient.delete was unsuccessful, we just ignore and go ahead to runIamLeaderProcess(...) which leads to OverseerElectionContext.runLeaderProcess(...) where it tries to create the /overseer_elect/leader node: {code} zkClient.makePath(leaderPath, ZkStateReader.toJSON(myProps), CreateMode.EPHEMERAL, true); {code} This is where things go wrong. Because the /overseer_elect/leader node already existed, the zkClient.makePath fails and the node decides to give up because it think that there is already a leader. It never tries to rejoin election ever. Then once the ephemeral /overseer_elect/leader node goes away (after the previous overseer leader exits), the cluster is left with no leader. Shouldn't the node next in line to become a leader try again or rejoin the election instead of giving up? SolrCloud cluster can end up without an overseer Key: SOLR-6095 URL: https://issues.apache.org/jira/browse/SOLR-6095 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8 Reporter: Shalin Shekhar Mangar Fix For: 4.9, 5.0 We have a large cluster running on ec2 which occasionally ends up without an overseer after a rolling restart. We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly. This cluster is running a custom branch forked from 4.8 and has SOLR-5473, SOLR-5495 and SOLR-5468 applied. We have a large number of small collections (120 collections each with approx 5M docs) on 16 Solr nodes. We are also using the overseer roles feature to designate two specified nodes as overseers. However, I think the problem that we're seeing is not specific to the overseer roles feature. As soon as the overseer was shutdown, we saw the following on the node which was next in line to become the overseer: {code} 2014-05-20 09:55:39,261 [main-EventThread] INFO solr.cloud.ElectionContext - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr 2014-05-20 09:55:39,265 [main-EventThread] WARN solr.cloud.LeaderElector - org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /overseer_elect/leader at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373) at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110) at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55) at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} When the overseer leader node is gracefully shutdown, we get the following in the logs: {code} 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer - Exception in Overseer main queue loop org.apache.solr.common.SolrException: Could not load collection from ZK:sm12 at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246) at
[jira] [Created] (LUCENE-5690) expose sub-Terms from MultiTerms
Yonik Seeley created LUCENE-5690: Summary: expose sub-Terms from MultiTerms Key: LUCENE-5690 URL: https://issues.apache.org/jira/browse/LUCENE-5690 Project: Lucene - Core Issue Type: Improvement Reporter: Yonik Seeley MultiTermsEnum and MultiDocsEnum both expose their subs. It would be useful to do the same for MultiTerms. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.
[ https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003626#comment-14003626 ] Anshum Gupta commented on SOLR-5495: The CHANGES.txt entry for trunk and 4x are in different sections. trunk: 5.0 section 4x: 4.9 section Recovery strategy for leader partitioned from replica case. --- Key: SOLR-5495 URL: https://issues.apache.org/jira/browse/SOLR-5495 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Timothy Potter Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch We need to work out a strategy for the case of: Leader and replicas can still talk to ZooKeeper, Leader cannot talk to replica. We punted on this in initial design, but I'd like to get something in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.
[ https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003638#comment-14003638 ] ASF subversion and git services commented on SOLR-5495: --- Commit 1596315 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1596315 ] SOLR-5495: Re-arrange location of SOLR-5495 and SOLR-5468 in CHANGES.txt Recovery strategy for leader partitioned from replica case. --- Key: SOLR-5495 URL: https://issues.apache.org/jira/browse/SOLR-5495 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Timothy Potter Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch We need to work out a strategy for the case of: Leader and replicas can still talk to ZooKeeper, Leader cannot talk to replica. We punted on this in initial design, but I'd like to get something in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5468) Option to enforce a majority quorum approach to accepting updates in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003639#comment-14003639 ] ASF subversion and git services commented on SOLR-5468: --- Commit 1596315 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1596315 ] SOLR-5495: Re-arrange location of SOLR-5495 and SOLR-5468 in CHANGES.txt Option to enforce a majority quorum approach to accepting updates in SolrCloud -- Key: SOLR-5468 URL: https://issues.apache.org/jira/browse/SOLR-5468 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.5 Environment: All Reporter: Timothy Potter Assignee: Timothy Potter Priority: Minor Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch I've been thinking about how SolrCloud deals with write-availability using in-sync replica sets, in which writes will continue to be accepted so long as there is at least one healthy node per shard. For a little background (and to verify my understanding of the process is correct), SolrCloud only considers active/healthy replicas when acknowledging a write. Specifically, when a shard leader accepts an update request, it forwards the request to all active/healthy replicas and only considers the write successful if all active/healthy replicas ack the write. Any down / gone replicas are not considered and will sync up with the leader when they come back online using peer sync or snapshot replication. For instance, if a shard has 3 nodes, A, B, C with A being the current leader, then writes to the shard will continue to succeed even if B C are down. The issue is that if a shard leader continues to accept updates even if it loses all of its replicas, then we have acknowledged updates on only 1 node. If that node, call it A, then fails and one of the previous replicas, call it B, comes back online before A does, then any writes that A accepted while the other replicas were offline are at risk to being lost. SolrCloud does provide a safe-guard mechanism for this problem with the leaderVoteWait setting, which puts any replicas that come back online before node A into a temporary wait state. If A comes back online within the wait period, then all is well as it will become the leader again and no writes will be lost. As a side note, sys admins definitely need to be made more aware of this situation as when I first encountered it in my cluster, I had no idea what it meant. My question is whether we want to consider an approach where SolrCloud will not accept writes unless there is a majority of replicas available to accept the write? For my example, under this approach, we wouldn't accept writes if both BC failed, but would if only C did, leaving A B online. Admittedly, this lowers the write-availability of the system, so may be something that should be tunable? From Mark M: Yeah, this is kind of like one of many little features that we have just not gotten to yet. I’ve always planned for a param that let’s you say how many replicas an update must be verified on before responding success. Seems to make sense to fail that type of request early if you notice there are not enough replicas up to satisfy the param to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2894: --- Attachment: SOLR-2894.patch I haven't had a lot of time to review the updatd patch in depth, but I did spend some time trying to improve TestCloudPivotFacet to resolve some of the nocommits -- but i'm still seeing failures... 1) I realized the depth check i was trying to do was bogus and commented it out (still need to purge the code - didn't want to muck with that until the rest of the test was passing more reliably) 2) the NPE I mentioned in QueryResponse.readPivots is still happening, but i realized that it has nothing to do with the datatype of the fields being pivoted on -- it only seemed that way because of the poor randomization of values getting put in the single valued string fields vs the multivalued fields in the old version of the test. The bug seems to pop up in _some_ cases where a pivot constraint has no sub-pivots. Normally this results in a NamedList with 3 keys (field,value,count) -- the 4th pivot key is only included if there is a list of at least 1 sub-pivot. But in some cases (I can't explain from looking at the code why) the server is responding back with a 4th entry using hte key pivot but the value is null We need to get to the bottom of this -- it's not clear if there is a bug preventing real sub-pivot constraints from being returned correctly, or if this is just a mistake in the code where it's putting null in the NamedList instead of not adding anything at all (in which case it might be tempting to make QueryResponse.readPivots smart enough to deal with it, but if we did that it would still be broken for older clients -- best to stick with teh current API semantics) In the attached patch update, this seed will fail showing the null sub-pivots problem... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=680E68425E7CA1BA -Dtests.slow=true -Dtests.locale=es_US -Dtests.timezone=Canada/Eastern -Dtests.file.encoding=UTF-8 [junit4] FAILURE 41.7s | TestCloudPivotFacet.testDistribSearch [junit4] Throwable #1: java.lang.AssertionError: Server sent back 'null' for sub pivots? [junit4]at __randomizedtesting.SeedInfo.seed([680E68425E7CA1BA:E9E8E65A2923C186]:0) [junit4]at org.apache.solr.client.solrj.response.QueryResponse.readPivots(QueryResponse.java:383) [junit4]at org.apache.solr.client.solrj.response.QueryResponse.extractFacetInfo(QueryResponse.java:363) [junit4]at org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:148) [junit4]at org.apache.solr.client.solrj.response.QueryResponse.init(QueryResponse.java:91) [junit4]at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) [junit4]at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:161) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145) {noformat} 3) Independent (i think) from the NPE issue, there is still something wonky with the refined counts when mincount is specified... Here for example is a seed that gets based the QueryResponse.readPivots, but then fails the numFound validation queries used to check the pivot counts... {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=testDistribSearch -Dtests.seed=F08A107C384690FC -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Jamaica -Dtests.file.encoding=UTF-8 [junit4] FAILURE 27.0s | TestCloudPivotFacet.testDistribSearch [junit4] Throwable #1: java.lang.AssertionError: {main({main(facet.pivot.mincount=9),extra({main(facet.limit=12),extra({main(facet.pivot=pivot_y_s%2Cpivot_x_s1),extra(facet=truefacet.pivot=pivot_x_s1%2Cpivot_x_s)})})}),extra(rows=0q=id%3A%5B*+TO+503%5D)} == pivot_y_s,pivot_x_s1: {params(rows=0),defaults({main(rows=0q=id%3A%5B*+TO+503%5D),extra(fq=%7B%21term+f%3Dpivot_y_s%7D)})} expected:9 but was:14 [junit4]at __randomizedtesting.SeedInfo.seed([F08A107C384690FC:716C9E644F19F0C0]:0) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:190) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:145) [junit4]at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:863) [junit4]at java.lang.Thread.run(Thread.java:744) [junit4] Caused by: java.lang.AssertionError: pivot_y_s,pivot_x_s1:
[jira] [Resolved] (SOLR-6097) Posting JSON with results in lost information
[ https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-6097. Resolution: Cannot Reproduce Cannot Reproduce Please post more details of your situation (inlcuding specifics on how exactly you are adding your data to Solr) to a new thread on the solr-user mailing list. In the event that more details about your usage helps uncover reliable/reproducible steps to recreate the problem, we can re-open the issue with an updated summary. Using hte 4.7.2 example configs... {noformat} hossman@frisbee:~$ curl -X POST -H Content-Type: application/json --data-binary ' { add : { commitWithin : 5000, doc : { id : 12345, body_s : a b c } } } ' http://localhost:8983/solr/collection1/update {responseHeader:{status:0,QTime:24}} hossman@frisbee:~$ curl 'http://localhost:8983/solr/collection1/select?q=id:12345wt=jsonindent=true' { responseHeader:{ status:0, QTime:1, params:{ indent:true, q:id:12345, wt:json}}, response:{numFound:1,start:0,docs:[ { id:12345, body_s:a b c, _version_:1468642762402299904}] }} {noformat} Posting JSON with results in lost information - Key: SOLR-6097 URL: https://issues.apache.org/jira/browse/SOLR-6097 Project: Solr Issue Type: Bug Affects Versions: 4.7.2 Reporter: Kingston Duffie Post the following JSON to add a document: { add : { commitWithin : 5000, doc : { id : 12345, body : a b c } } } The body field is configured in the schema as: field name=body type=text_hive indexed=true stored=true required=false multiValued=false/ and fieldType name=text_hive class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The problem is this: After submitting this post, if you go to the SOLR console and find this document, the stored body will be missing the contents between the less-than and greater-than symbols -- i.e., a c. If you encode the body (i.e., a lt; b gt; c), it will show up with and symbols. That is, it appears that SOLR is stripping out HTML tags even though we are not asking it to. Note that it is not only the storage but also indexing that is affected (as we originally found the issue because searching for b would not match this document. I'm willing to believe that I'm doing something wrong, but I can't see anywhere in any spec that suggests that strings inside JSON need to be -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5690) expose sub-Terms from MultiTerms
[ https://issues.apache.org/jira/browse/LUCENE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-5690: - Attachment: LUCENE-5690.patch Trivial patch attached. expose sub-Terms from MultiTerms Key: LUCENE-5690 URL: https://issues.apache.org/jira/browse/LUCENE-5690 Project: Lucene - Core Issue Type: Improvement Reporter: Yonik Seeley Attachments: LUCENE-5690.patch MultiTermsEnum and MultiDocsEnum both expose their subs. It would be useful to do the same for MultiTerms. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!
: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/ : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC The 2 facet test failures both related to DocValues and reproduce reliably for me on my Linux box. Is it possible these are related to the UninvertingDirectoryReader changes? [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestRandomFaceting -Dtests.method=testRandomFaceting -Dtests.seed=EF686C57D2698AAE -Dtests.slow=true -Dtests.locale=ar_BH -Dtests.timezone=Europe/Oslo -Dtests.file.encoding=UTF-8 [junit4] ERROR 3.13s | TestRandomFaceting.testRandomFaceting [junit4] Throwable #1: org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.AssertionError [junit4]at __randomizedtesting.SeedInfo.seed([EF686C57D2698AAE:E2004C8287904211]:0) [junit4]at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:595) [junit4]at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:260) [junit4]at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84) [junit4]at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:221) [junit4]at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [junit4]at org.apache.solr.core.SolrCore.execute(SolrCore.java:1964) [junit4]at org.apache.solr.util.TestHarness.query(TestHarness.java:295) [junit4]at org.apache.solr.util.TestHarness.query(TestHarness.java:278) [junit4]at org.apache.solr.TestRandomFaceting.doFacetTests(TestRandomFaceting.java:216) [junit4]at org.apache.solr.TestRandomFaceting.doFacetTests(TestRandomFaceting.java:138) [junit4]at org.apache.solr.TestRandomFaceting.testRandomFaceting(TestRandomFaceting.java:121) [junit4]at java.lang.Thread.run(Thread.java:745) [junit4] Caused by: java.lang.AssertionError [junit4]at org.apache.solr.request.DocValuesFacets.getCounts(DocValuesFacets.java:106) [junit4]at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:432) [junit4]at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:561) [junit4]at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:553) [junit4]at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4]at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:507) [junit4]at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:577) [junit4]... 50 more [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFaceting -Dtests.method=testMultiThreadedFacets -Dtests.seed=EF686C57D2698AAE -Dtests.slow=true -Dtests.locale=ja -Dtests.timezone=Africa/Lusaka -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.30s | TestFaceting.testMultiThreadedFacets [junit4] Throwable #1: java.lang.AssertionError: expected same:org.apache.lucene.index.MultiDocValues$MultiSortedSetDocValues@1840511 was not:org.apache.lucene.index.MultiDocValues$MultiSortedSetDocValues@c88009 [junit4]at __randomizedtesting.SeedInfo.seed([EF686C57D2698AAE:C040D5DEE621F7BA]:0) [junit4]at org.apache.solr.request.TestFaceting.assertEquals(TestFaceting.java:958) [junit4]at org.apache.solr.request.TestFaceting.testMultiThreadedFacets(TestFaceting.java:928) [junit4]at java.lang.Thread.run(Thread.java:745) -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in
[ https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003723#comment-14003723 ] Michael McCandless commented on LUCENE-5670: Thanks Christian, the patch looks good to me! I'll commit soon. org.apache.lucene.util.fst.FST should skip over outputs it is not interested in --- Key: LUCENE-5670 URL: https://issues.apache.org/jira/browse/LUCENE-5670 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.7 Reporter: Christian Ziech Assignee: Michael McCandless Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch Currently the FST uses the read(DataInput) method from the Outputs class to skip over outputs it actually is not interested in. For most use cases this just creates some additional objects that are immediately destroyed again. When traversing an FST with non-trivial data however this can easily add up to several excess objects that nobody actually ever read. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in
[ https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5670: --- Fix Version/s: 5.0 4.9 org.apache.lucene.util.fst.FST should skip over outputs it is not interested in --- Key: LUCENE-5670 URL: https://issues.apache.org/jira/browse/LUCENE-5670 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.7 Reporter: Christian Ziech Assignee: Michael McCandless Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch Currently the FST uses the read(DataInput) method from the Outputs class to skip over outputs it actually is not interested in. For most use cases this just creates some additional objects that are immediately destroyed again. When traversing an FST with non-trivial data however this can easily add up to several excess objects that nobody actually ever read. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!
On Tue, May 20, 2014 at 1:44 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/ : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC The 2 facet test failures both related to DocValues and reproduce reliably for me on my Linux box. Is it possible these are related to the UninvertingDirectoryReader changes? I think its a bug in the assert. I'll take a look in 5 minutes. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents
[ https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003745#comment-14003745 ] Mikhail Khludnev commented on SOLR-6096: bq. I would expect that John is not in the index anymore. Currently he is. Giving the [test|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/update/AddBlockUpdateTest.java#L142] I suppose that update with children works fine if you specify {code}add overwrite = truedoc.../doc /add{code} To make it work you need to have uniqueKey defined. hmm.. I just checked it manually myself, with the tiny data from the blog. It works fine as expected - overwrite = true is implied by default. My guess is you either dont have uniqueKey defined, or you forget to commit, after update. Support Update and Delete on nested documents - Key: SOLR-6096 URL: https://issues.apache.org/jira/browse/SOLR-6096 Project: Solr Issue Type: Improvement Affects Versions: 4.7.2 Reporter: Thomas Scheffler Labels: blockjoin, nested When using nested or child document. Update and delete operation on the root document should also affect the nested documents, as no child can exist without its parent :-) Example {code:xml|title=First Import} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, John/field field name=roleauthor/field /doc /doc {code} If I change my mind and the author was not named *John* but *_Jane_*: {code:xml|title=Changed name of author of '1'} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, Jane/field field name=roleauthor/field /doc /doc {code} I would expect that John is not in the index anymore. Currently he is. There might also be the case that any subdocument is removed by an update: {code:xml|title=Remove author} doc field name=id1/field field name=titleArticle without author/field /doc {code} This should affect a delete on all nested documents, too. The same way all nested documents should be deleted if I delete the root document: {code:xml|title=Deletion of '1'} delete id1/id !-- implying also query_root_:1/query -- /delete {code} This is currently possible to do all this stuff on client side by issuing additional request to delete document before every update. It would be more efficient if this could be handled on SOLR side. One would benefit on atomic update. The biggest plus shows when using delete-by-query. {code:xml|title=Deletion of '1' by query} delete querytitle:*/query !-- implying also query_root_:1/query -- /delete {code} In that case one would not have to first query all documents and issue deletes by those id and every document that are nested. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4236: Attachment: LUCENE-4236.patch Patch just synced to trunk. Some of the previous stuff in it has been committed. Its still not ready to be committed: there is a lot of messy/bogus stuff I did. Will maybe make a branch and try to clean it up more. clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!
I committed a fix. On Tue, May 20, 2014 at 1:58 PM, Robert Muir rcm...@gmail.com wrote: On Tue, May 20, 2014 at 1:44 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/ : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC The 2 facet test failures both related to DocValues and reproduce reliably for me on my Linux box. Is it possible these are related to the UninvertingDirectoryReader changes? I think its a bug in the assert. I'll take a look in 5 minutes. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents
[ https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003767#comment-14003767 ] Mikhail Khludnev commented on SOLR-6096: bq. There might also be the case that any subdocument is removed by an update: right. that's what I raised in SOLR-5211. The problem is backward compatibility - that update wasn't treat as block update before, and now Solr can just imply that every update is the block update. However, we can introduce the special purposed {{/blockupdate/}} handler with explicit block semantics for all case above . So far it can be healed but in an ugly way eg. it need to support _children nuke_ update eg: {code} doc field name=id1/field field name=titleArticle without author/field nodocs/ /doc doc field name=id1/field field name=titleArticle without author/field docs/docs /doc doc childfree=true field name=id1/field field name=titleArticle without author/field /doc {code} pick your favorite one! and more than that SolrJ's SolrInputDocument amendments are expected even more weird. Support Update and Delete on nested documents - Key: SOLR-6096 URL: https://issues.apache.org/jira/browse/SOLR-6096 Project: Solr Issue Type: Improvement Affects Versions: 4.7.2 Reporter: Thomas Scheffler Labels: blockjoin, nested When using nested or child document. Update and delete operation on the root document should also affect the nested documents, as no child can exist without its parent :-) Example {code:xml|title=First Import} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, John/field field name=roleauthor/field /doc /doc {code} If I change my mind and the author was not named *John* but *_Jane_*: {code:xml|title=Changed name of author of '1'} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, Jane/field field name=roleauthor/field /doc /doc {code} I would expect that John is not in the index anymore. Currently he is. There might also be the case that any subdocument is removed by an update: {code:xml|title=Remove author} doc field name=id1/field field name=titleArticle without author/field /doc {code} This should affect a delete on all nested documents, too. The same way all nested documents should be deleted if I delete the root document: {code:xml|title=Deletion of '1'} delete id1/id !-- implying also query_root_:1/query -- /delete {code} This is currently possible to do all this stuff on client side by issuing additional request to delete document before every update. It would be more efficient if this could be handled on SOLR side. One would benefit on atomic update. The biggest plus shows when using delete-by-query. {code:xml|title=Deletion of '1' by query} delete querytitle:*/query !-- implying also query_root_:1/query -- /delete {code} In that case one would not have to first query all documents and issue deletes by those id and every document that are nested. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5690) expose sub-Terms from MultiTerms
[ https://issues.apache.org/jira/browse/LUCENE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003803#comment-14003803 ] Uwe Schindler commented on LUCENE-5690: --- bq. MultiTermsEnum and MultiDocsEnum both expose their subs I don't see this in the code of MultiTermsEnum. Both have the arrays private and MultiTermsEnum has no getter (at least in trunk). MultiDocsEnum has a getter (not sure why). In MultiTermsEnum there is one public getter, but this one returns a package private class, so unuseable to the user (this is a bug) - should be removed. This patch is fine if you make the methods return ListTerms and ListReaderSlice, both with {{Collections.unmodifiableList(Arrays.asList(...))}}. But ReaderSlice is also a more or less private class (its just public for cross-package access). What's the reason to have those public at all? expose sub-Terms from MultiTerms Key: LUCENE-5690 URL: https://issues.apache.org/jira/browse/LUCENE-5690 Project: Lucene - Core Issue Type: Improvement Reporter: Yonik Seeley Attachments: LUCENE-5690.patch MultiTermsEnum and MultiDocsEnum both expose their subs. It would be useful to do the same for MultiTerms. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6096) Support Update and Delete on nested documents
[ https://issues.apache.org/jira/browse/SOLR-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003804#comment-14003804 ] Mikhail Khludnev commented on SOLR-6096: please get me right, but I don't feel like sending {{query\_root\_:1/query}} is a much problem at comparison to {{id1/id}} {{deletequerytitle:*/query/delete}} can be fixed by nuking children right before removing parents explicitly {code} delete query{!child of=type_s:parent}title:*/query querytitle:*/query /delete {code} I share your concern that it not so cute as it could be. I've got one thought. what if you configure that darn \_root\_ field as aт uniqueKey? if it will work fine in all cases, but you wouldn't happy to specify such odd uniqueKey, we can create special DirectUpdateHandler2 which can use uniqueKey instead of \_root\_. WDYT? Support Update and Delete on nested documents - Key: SOLR-6096 URL: https://issues.apache.org/jira/browse/SOLR-6096 Project: Solr Issue Type: Improvement Affects Versions: 4.7.2 Reporter: Thomas Scheffler Labels: blockjoin, nested When using nested or child document. Update and delete operation on the root document should also affect the nested documents, as no child can exist without its parent :-) Example {code:xml|title=First Import} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, John/field field name=roleauthor/field /doc /doc {code} If I change my mind and the author was not named *John* but *_Jane_*: {code:xml|title=Changed name of author of '1'} doc field name=id1/field field name=titleArticle with author/field doc field name=nameSmith, Jane/field field name=roleauthor/field /doc /doc {code} I would expect that John is not in the index anymore. Currently he is. There might also be the case that any subdocument is removed by an update: {code:xml|title=Remove author} doc field name=id1/field field name=titleArticle without author/field /doc {code} This should affect a delete on all nested documents, too. The same way all nested documents should be deleted if I delete the root document: {code:xml|title=Deletion of '1'} delete id1/id !-- implying also query_root_:1/query -- /delete {code} This is currently possible to do all this stuff on client side by issuing additional request to delete document before every update. It would be more efficient if this could be handled on SOLR side. One would benefit on atomic update. The biggest plus shows when using delete-by-query. {code:xml|title=Deletion of '1' by query} delete querytitle:*/query !-- implying also query_root_:1/query -- /delete {code} In that case one would not have to first query all documents and issue deletes by those id and every document that are nested. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6098) SOLR console displaying JSON does not escape text properly
Kingston Duffie created SOLR-6098: - Summary: SOLR console displaying JSON does not escape text properly Key: SOLR-6098 URL: https://issues.apache.org/jira/browse/SOLR-6098 Project: Solr Issue Type: Bug Reporter: Kingston Duffie Priority: Minor In the SOLR admin web console, when displaying JSON response for Query, the text is not being HTML escaped, so any text that happens to match HTML markup is being processed as HTML. For example, enter strikehello/strike in the q textbox and the responseHeader will contain: q: body:hello where the hello portion is shown using strikeout. This seems benign, but can be extremely confusing when viewing results, because if your fields happen to contain, for example, f...@bar.com, this will be completely missing (because the browser treats this as an invalid tag). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in
[ https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003834#comment-14003834 ] ASF subversion and git services commented on LUCENE-5670: - Commit 1596368 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596368 ] LUCENE-5670: add skip/FinalOutput to FST Outputs org.apache.lucene.util.fst.FST should skip over outputs it is not interested in --- Key: LUCENE-5670 URL: https://issues.apache.org/jira/browse/LUCENE-5670 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.7 Reporter: Christian Ziech Assignee: Michael McCandless Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch Currently the FST uses the read(DataInput) method from the Outputs class to skip over outputs it actually is not interested in. For most use cases this just creates some additional objects that are immediately destroyed again. When traversing an FST with non-trivial data however this can easily add up to several excess objects that nobody actually ever read. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases
[ https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5691: Attachment: LUCENE-5691.patch patch with a test. We should backport to 4.x too DocTermsOrds lookupTerm is wrong in some cases -- Key: LUCENE-5691 URL: https://issues.apache.org/jira/browse/LUCENE-5691 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5691.patch needs the following two conditions: * underlying termsenum supports ord() * the term you lookup would be inserted at the end (e.g. seek returns END) the fix is simple, it just needs to handle SeekStatus.END properly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases
Robert Muir created LUCENE-5691: --- Summary: DocTermsOrds lookupTerm is wrong in some cases Key: LUCENE-5691 URL: https://issues.apache.org/jira/browse/LUCENE-5691 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5691.patch needs the following two conditions: * underlying termsenum supports ord() * the term you lookup would be inserted at the end (e.g. seek returns END) the fix is simple, it just needs to handle SeekStatus.END properly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases
[ https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003836#comment-14003836 ] Michael McCandless commented on LUCENE-5691: +1, nice catch! DocTermsOrds lookupTerm is wrong in some cases -- Key: LUCENE-5691 URL: https://issues.apache.org/jira/browse/LUCENE-5691 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5691.patch needs the following two conditions: * underlying termsenum supports ord() * the term you lookup would be inserted at the end (e.g. seek returns END) the fix is simple, it just needs to handle SeekStatus.END properly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in
[ https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003840#comment-14003840 ] ASF subversion and git services commented on LUCENE-5670: - Commit 1596369 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1596369 ] LUCENE-5670: add skip/FinalOutput to FST Outputs org.apache.lucene.util.fst.FST should skip over outputs it is not interested in --- Key: LUCENE-5670 URL: https://issues.apache.org/jira/browse/LUCENE-5670 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.7 Reporter: Christian Ziech Assignee: Michael McCandless Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch Currently the FST uses the read(DataInput) method from the Outputs class to skip over outputs it actually is not interested in. For most use cases this just creates some additional objects that are immediately destroyed again. When traversing an FST with non-trivial data however this can easily add up to several excess objects that nobody actually ever read. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6098) SOLR console displaying JSON does not escape text properly
[ https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003837#comment-14003837 ] Stefan Matheis (steffkes) commented on SOLR-6098: - You're not telling, which release you're referring to? from your description, that sounds a bit like SOLR-5174 which got fixed with 4.5. please let me know if that's your issue as well - in which case upgrading would already fix it and i'm going to close this as duplicate or it's something else and needs to be taken care of SOLR console displaying JSON does not escape text properly -- Key: SOLR-6098 URL: https://issues.apache.org/jira/browse/SOLR-6098 Project: Solr Issue Type: Bug Components: web gui Reporter: Kingston Duffie Priority: Minor In the SOLR admin web console, when displaying JSON response for Query, the text is not being HTML escaped, so any text that happens to match HTML markup is being processed as HTML. For example, enter strikehello/strike in the q textbox and the responseHeader will contain: q: body:hello where the hello portion is shown using strikeout. This seems benign, but can be extremely confusing when viewing results, because if your fields happen to contain, for example, f...@bar.com, this will be completely missing (because the browser treats this as an invalid tag). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6098) SOLR console displaying JSON does not escape text properly
[ https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-6098: Component/s: web gui SOLR console displaying JSON does not escape text properly -- Key: SOLR-6098 URL: https://issues.apache.org/jira/browse/SOLR-6098 Project: Solr Issue Type: Bug Components: web gui Reporter: Kingston Duffie Priority: Minor In the SOLR admin web console, when displaying JSON response for Query, the text is not being HTML escaped, so any text that happens to match HTML markup is being processed as HTML. For example, enter strikehello/strike in the q textbox and the responseHeader will contain: q: body:hello where the hello portion is shown using strikeout. This seems benign, but can be extremely confusing when viewing results, because if your fields happen to contain, for example, f...@bar.com, this will be completely missing (because the browser treats this as an invalid tag). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5670) org.apache.lucene.util.fst.FST should skip over outputs it is not interested in
[ https://issues.apache.org/jira/browse/LUCENE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-5670. Resolution: Fixed Thanks Christian! org.apache.lucene.util.fst.FST should skip over outputs it is not interested in --- Key: LUCENE-5670 URL: https://issues.apache.org/jira/browse/LUCENE-5670 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.7 Reporter: Christian Ziech Assignee: Michael McCandless Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5670.patch, skipOutput_lucene48.patch Currently the FST uses the read(DataInput) method from the Outputs class to skip over outputs it actually is not interested in. For most use cases this just creates some additional objects that are immediately destroyed again. When traversing an FST with non-trivial data however this can easily add up to several excess objects that nobody actually ever read. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6097) Posting JSON with results in lost information
[ https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003845#comment-14003845 ] Stefan Matheis (steffkes) commented on SOLR-6097: - you didn't link those issues .. but since i saw SOLR-6098 after that one and already commented on that .. i guess they are related? Posting JSON with results in lost information - Key: SOLR-6097 URL: https://issues.apache.org/jira/browse/SOLR-6097 Project: Solr Issue Type: Bug Affects Versions: 4.7.2 Reporter: Kingston Duffie Post the following JSON to add a document: { add : { commitWithin : 5000, doc : { id : 12345, body : a b c } } } The body field is configured in the schema as: field name=body type=text_hive indexed=true stored=true required=false multiValued=false/ and fieldType name=text_hive class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The problem is this: After submitting this post, if you go to the SOLR console and find this document, the stored body will be missing the contents between the less-than and greater-than symbols -- i.e., a c. If you encode the body (i.e., a lt; b gt; c), it will show up with and symbols. That is, it appears that SOLR is stripping out HTML tags even though we are not asking it to. Note that it is not only the storage but also indexing that is affected (as we originally found the issue because searching for b would not match this document. I'm willing to believe that I'm doing something wrong, but I can't see anywhere in any spec that suggests that strings inside JSON need to be -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases
[ https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003851#comment-14003851 ] ASF subversion and git services commented on LUCENE-5691: - Commit 1596370 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1596370 ] LUCENE-5691: DocTermOrds lookupTerm is wrong in some cases DocTermsOrds lookupTerm is wrong in some cases -- Key: LUCENE-5691 URL: https://issues.apache.org/jira/browse/LUCENE-5691 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5691.patch needs the following two conditions: * underlying termsenum supports ord() * the term you lookup would be inserted at the end (e.g. seek returns END) the fix is simple, it just needs to handle SeekStatus.END properly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases
[ https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5691. - Resolution: Fixed Fix Version/s: 5.0 4.9 DocTermsOrds lookupTerm is wrong in some cases -- Key: LUCENE-5691 URL: https://issues.apache.org/jira/browse/LUCENE-5691 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5691.patch needs the following two conditions: * underlying termsenum supports ord() * the term you lookup would be inserted at the end (e.g. seek returns END) the fix is simple, it just needs to handle SeekStatus.END properly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5691) DocTermsOrds lookupTerm is wrong in some cases
[ https://issues.apache.org/jira/browse/LUCENE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003855#comment-14003855 ] ASF subversion and git services commented on LUCENE-5691: - Commit 1596371 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596371 ] LUCENE-5691: DocTermOrds lookupTerm is wrong in some cases DocTermsOrds lookupTerm is wrong in some cases -- Key: LUCENE-5691 URL: https://issues.apache.org/jira/browse/LUCENE-5691 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5691.patch needs the following two conditions: * underlying termsenum supports ord() * the term you lookup would be inserted at the end (e.g. seek returns END) the fix is simple, it just needs to handle SeekStatus.END properly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003858#comment-14003858 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596372 from [~rcmuir] in branch 'dev/branches/lucene4236' [ https://svn.apache.org/r1596372 ] LUCENE-4236: create branch to try to cleanup clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003860#comment-14003860 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596373 from [~rcmuir] in branch 'dev/branches/lucene4236' [ https://svn.apache.org/r1596373 ] LUCENE-4236: commit current state clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_05) - Build # 4048 - Failure!
: I committed a fix. For those trying to keep track at home... r1596346 fixed the assertSame failure in TestFaceting. i then pinged rmuir on IRC to ask him about the seeminly unrelated (code-not-test) assert failure in DocValuesFacets.getCounts which was causing TestRandomFaceting to fail using the same seed -- that aparently lead him to file LUCENE-5691. Once that was committed both failures seem to be completley fixed. Thanks rmuir. : On Tue, May 20, 2014 at 1:58 PM, Robert Muir rcm...@gmail.com wrote: : On Tue, May 20, 2014 at 1:44 PM, Chris Hostetter : hossman_luc...@fucit.org wrote: : : : Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4048/ : : Java: 32bit/jdk1.8.0_05 -client -XX:+UseG1GC : : The 2 facet test failures both related to DocValues and reproduce reliably : for me on my Linux box. : : Is it possible these are related to the UninvertingDirectoryReader : changes? : : : : I think its a bug in the assert. I'll take a look in 5 minutes. : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
[ https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003888#comment-14003888 ] ASF subversion and git services commented on SOLR-5681: --- Commit 1596379 from [~anshumg] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596379 ] SOLR-5681: Make the OverseerCollectionProcessor multi-threaded, merge from trunk (r1596089) Make the OverseerCollectionProcessor multi-threaded --- Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5690) expose sub-Terms from MultiTerms
[ https://issues.apache.org/jira/browse/LUCENE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003889#comment-14003889 ] Yonik Seeley commented on LUCENE-5690: -- bq. I don't see this in the code of MultiTermsEnum. see MultiTermsEnum .getMatchArray() bq. This patch is fine if you make the methods return ListTerms and ListReaderSlice Like the other methods, MultiTermsEnum .getMatchArray() and MultiDocsEnum.getSubs, we shouldn't add additional overhead of object creation just to inspect an object. These are expert level APIs that are not on the base class (hence will never be used casually). bq. What's the reason to have those public at all? Sometimes better efficiency, sometimes more information. One example for this specific addition is that MultiTerms.size() always returns -1. If we look at the sub-terms we can at least see what the number of terms for each segment is. expose sub-Terms from MultiTerms Key: LUCENE-5690 URL: https://issues.apache.org/jira/browse/LUCENE-5690 Project: Lucene - Core Issue Type: Improvement Reporter: Yonik Seeley Attachments: LUCENE-5690.patch MultiTermsEnum and MultiDocsEnum both expose their subs. It would be useful to do the same for MultiTerms. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
[ https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta resolved SOLR-5681. Resolution: Fixed Make the OverseerCollectionProcessor multi-threaded --- Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6099) Fix cleanup mechanism for a previous (failed) SPLITSHARD
Anshum Gupta created SOLR-6099: -- Summary: Fix cleanup mechanism for a previous (failed) SPLITSHARD Key: SOLR-6099 URL: https://issues.apache.org/jira/browse/SOLR-6099 Project: Solr Issue Type: Bug Reporter: Anshum Gupta Right now, SPLITSHARD tries to cleanup an under-construction/recovery shard using the deleteshard API if it already exists. DELETESHARD on the other hand will never delete a shard that's not INACTIVE/uses implicit routing. This would just raise an exception and never really delete a shard right now. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003983#comment-14003983 ] Shalin Shekhar Mangar commented on SOLR-6091: - Would it be a good idea to include the overseer leader node name for which the QUIT message is intended? I imagine that it'd help us catch wrong/extra QUITs and race conditions during testing? Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004008#comment-14004008 ] Jessica Cheng commented on SOLR-6091: - I think that's a good idea, and maybe we can go further and include the id in the /overseer/leader file (e.g. 91790253334528928-host:8983_solr-n_13) so that the Overseer would only quit if its ID actually matched it completely. It'd be like a CAS--do this operation only if the state I read to make this decision is still valid. Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004036#comment-14004036 ] Shalin Shekhar Mangar commented on SOLR-6091: - bq. maybe we can go further and include the id in the /overseer/leader file (e.g. 91790253334528928-host:8983_solr-n_13) so that the Overseer would only quit if its ID actually matched it completely. +1 I'll work up a patch and try it out. Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5666) Add UninvertingReader
[ https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004172#comment-14004172 ] ASF subversion and git services commented on LUCENE-5666: - Commit 1596429 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1596429 ] LUCENE-5666: Fix idea project files Add UninvertingReader - Key: LUCENE-5666 URL: https://issues.apache.org/jira/browse/LUCENE-5666 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 5.0 Attachments: LUCENE-5666.patch Currently the fieldcache is not pluggable at all. It would be better if everything used the docvalues apis. This would allow people to customize the implementation, extend the classes with custom subclasses with additional stuff, etc etc. FieldCache can be accessed via the docvalues apis, using the FilterReader api. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004229#comment-14004229 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596440 from [~rcmuir] in branch 'dev/branches/lucene4236' [ https://svn.apache.org/r1596440 ] LUCENE-4236: try to make more palatable clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 165 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/165/ No tests ran. Build Log: [...truncated 53826 lines...] prepare-release-no-sign: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease [copy] Copying 431 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene [copy] Copying 239 files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7 [exec] NOTE: output encoding is US-ASCII [exec] [exec] Load release URL file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/... [exec] [exec] Test Lucene... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.01 sec (13.3 MB/sec) [exec] check changes HTML... [exec] download lucene-4.9.0-src.tgz... [exec] 27.5 MB in 0.04 sec (660.8 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.9.0.tgz... [exec] 61.4 MB in 0.09 sec (648.5 MB/sec) [exec] verify md5/sha1 digests [exec] download lucene-4.9.0.zip... [exec] 71.0 MB in 0.16 sec (444.6 MB/sec) [exec] verify md5/sha1 digests [exec] unpack lucene-4.9.0.tgz... [exec] verify JAR metadata/identity/no javax.* or java.* classes... [exec] test demo with 1.7... [exec] got 5697 hits for query lucene [exec] check Lucene's javadoc JAR [exec] unpack lucene-4.9.0.zip... [exec] verify JAR metadata/identity/no javax.* or java.* classes... [exec] test demo with 1.7... [exec] got 5697 hits for query lucene [exec] check Lucene's javadoc JAR [exec] unpack lucene-4.9.0-src.tgz... [exec] make sure no JARs/WARs in src dist... [exec] run ant validate [exec] run tests w/ Java 7 and testArgs='-Dtests.jettyConnector=Socket -Dtests.disableHdfs=true'... [exec] test demo with 1.7... [exec] got 250 hits for query lucene [exec] generate javadocs w/ Java 7... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [exec] [exec] Test Solr... [exec] test basics... [exec] get KEYS [exec] 0.1 MB in 0.00 sec (53.0 MB/sec) [exec] Traceback (most recent call last): [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 1347, in module [exec] main() [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 1291, in main [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, testArgs) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 1333, in smokeTest [exec] checkSigs('solr', solrPath, version, tmpDir, isSigned) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 410, in checkSigs [exec] testChanges(project, version, changesURL) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 458, in testChanges [exec] checkChangesContent(s, version, changesURL, project, True) [exec] File /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py, line 485, in checkChangesContent [exec] raise RuntimeError('incorrect issue (_ instead of -) in %s: %s' % (name, m.group(1))) [exec] RuntimeError: incorrect issue (_ instead of -) in file:///usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr/changes/Changes.html: SOLR_3671 [exec] check changes HTML... BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:387: exec returned: 1 Total time: 53 minutes 57 seconds Build step 'Invoke Ant' marked build as failure Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6098) SOLR console displaying JSON does not escape text properly
[ https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004239#comment-14004239 ] Kingston Duffie commented on SOLR-6098: --- Yes. Sorry. We are using 4.4. I suspect this is the same issue and so this can be closed. SOLR console displaying JSON does not escape text properly -- Key: SOLR-6098 URL: https://issues.apache.org/jira/browse/SOLR-6098 Project: Solr Issue Type: Bug Components: web gui Reporter: Kingston Duffie Priority: Minor In the SOLR admin web console, when displaying JSON response for Query, the text is not being HTML escaped, so any text that happens to match HTML markup is being processed as HTML. For example, enter strikehello/strike in the q textbox and the responseHeader will contain: q: body:hello where the hello portion is shown using strikeout. This seems benign, but can be extremely confusing when viewing results, because if your fields happen to contain, for example, f...@bar.com, this will be completely missing (because the browser treats this as an invalid tag). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004240#comment-14004240 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596442 from [~rcmuir] in branch 'dev/branches/lucene4236' [ https://svn.apache.org/r1596442 ] LUCENE-4236: move all crazies into one place clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004300#comment-14004300 ] Robert Muir commented on LUCENE-4236: - I tried to clean this up as much as I can... OR tasks look fine (with BS1 disabled on both checkouts): {noformat} Task QPS trunk StdDev QPS patch StdDev Pct diff OrNotHighLow 958.58 (2.1%) 988.37 (2.7%) 3.1% ( -1% -8%) OrHighHigh 18.76 (8.2%) 19.83 (15.1%) 5.7% ( -16% - 31%) OrHighMed 43.12 (8.3%) 45.64 (15.3%) 5.9% ( -16% - 32%) OrHighLow 44.76 (9.1%) 47.92 (16.9%) 7.1% ( -17% - 36%) OrNotHighMed 168.34 (3.4%) 189.12 (4.3%) 12.3% ( 4% - 20%) OrNotHighHigh 45.16 (3.5%) 60.43 (12.2%) 33.8% ( 17% - 51%) OrHighNotHigh 27.07 (3.4%) 36.27 (12.6%) 34.0% ( 17% - 51%) OrHighNotMed 79.81 (3.7%) 111.10 (15.3%) 39.2% ( 19% - 60%) OrHighNotLow 96.78 (4.0%) 137.49 (17.1%) 42.1% ( 20% - 65%) {noformat} It would also be nice to run the eg.filter.tasks but these are currently broken in luceneutil. clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004367#comment-14004367 ] Dawid Weiss commented on LUCENE-5650: - Commit it to the branch, Ryan. I fixed it the way I understand how Solr's source code works (which is to say: vaguely familiar). I'm sure your patch is better. The build yesterday ended with this failure related to perm. denied: {code} [13:07:35.284] ERROR 0.00s J1 | TestFoldingMultitermExtrasQuery (suite) Throwable #1: org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: access denied (java.io.FilePermission analysis-extras\solr\collection1\conf write) {code} and the following, I guess notorious offenders: {code} - org.apache.solr.spelling.suggest.TestAnalyzeInfixSuggestions (could not remove temp. files) - TestBlendedInfixSuggestions (same thing) - org.apache.solr.cloud.SyncSliceTest.testDistribSearch Throwable #1: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request - org.apache.solr.cloud.RecoveryZkTest.testDistribSearch Throwable #1: java.lang.AssertionError: shard1 is not consistent. Got 92 from https://127.0.0.1:54379/_koq/fr/collection1lastClient and got 60 from https://127.0.0.1:54410/_koq/fr/collection1 {code} I won't be able to return to this today (maybe in the evening). Change my DIH fixes to yours on the branch -- I don't mind at all. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004368#comment-14004368 ] Dawid Weiss commented on LUCENE-5650: - One comment wrt the dih patch: {code} +System.clearProperty(solr.solr.home); {code} I think there is a restore-sys-props rule somewhere in the upper class that will take care of this. In Lucene there is no such rule, but in Solr so many properties get set (even from other software packages) that it didn't make sense to track them all manually. You'd have to check though, I may be wrong. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004372#comment-14004372 ] ASF subversion and git services commented on LUCENE-5650: - Commit 1596472 from [~dawidweiss] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1596472 ] LUCENE-5650: applying Ryan's DIH patch. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5692) Deprecate spatial DisjointSpatialFilter
David Smiley created LUCENE-5692: Summary: Deprecate spatial DisjointSpatialFilter Key: LUCENE-5692 URL: https://issues.apache.org/jira/browse/LUCENE-5692 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Affects Versions: 5.0 Reporter: David Smiley The spatial predicate IsDisjointTo is almost the same as the inverse of Intersects, except that it shouldn't match documents without spatial data. In another sense it's as if the query shape were inverted. DisjointSpatialFilter is a utility filter that works (or worked, rather) by using the FieldCache to see which documents have spatial data (getDocsWithField()). Calculating that was probably very slow but it was at least cacheable. Since LUCENE-5666 (v5/trunk only), Rob replaced this to use DocValues. However for some SpatialStrategies (PrefixTree based) it wouldn't make any sense to use DocValues *just* so that at search time you could call getDocsWithField() when there's no other need for the un-inversion (e.g. no need to lookup terms by document). Perhaps an immediate fix is simply to revert the change made to DisjointSpatialFilter so that it uses the FieldCache again, if that works (though it's not public?). But stepping back a bit, this DisjointSpatialFilter is really something unfortunate that doesn't work as well as it could because it's not at the level of Solr or ES -- that is, there's no access to a filter-cache. So I propose I simply remove it, and if a user wants to do this for real, they should index a boolean field marking wether there's spatial data and then combine that with a NOT and Intersects, in a straight-forward way. Alternatively, some sort of inverting query shape could be developed, although it wouldn't work with the SpatialPrefixTree technique because there is no edge distinction -- the edge matches normally and notwithstanding changes to RPT algorithms it would also match the edge of an inverted shape. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org