[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882621#comment-13882621 ] Shalin Shekhar Mangar commented on SOLR-5477: - Thanks Anshum. # Why is it called a taskQueue in CoreAdminHandler? There is no queueing happening here. # Why is the taskQueue defined as a MapString, MapString, TaskObject? It can simply be a MapString, TaskObject. The task object itself can contain a volatile status flag to indicate running/completed/failure. # The CoreAdminHandler.addTask with limit=true just removes a random (first?) entry if the limit is reached. It should remove the oldest entry instead. # OverseerCollectionProcessor.requestStatus returns response with “success” even if requestid is found in “running” or “failure” map # The ‘migrate’ api doesn’t use async core admin requests # In all places where synchronous calls have been replaced with waitForAsyncCallsToComplete calls, we need to ensure that the correct response messages are returned on failures. Right now, the waitForAsyncCallToComplete method returns silently on detecting failure. # Although there is a provision to clear the overseer status maps by passing requestid=1, it is never actually called. When do you intend to call this api? # I don’t understand why we need three different maps for running/completed/failure for overseer collection processor. My comment #2 applies here too. We can store the status in the value bytes instead of keeping three different maps and moving the key around. What do you think? Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882627#comment-13882627 ] Anshum Gupta commented on SOLR-5477: bq. Why is it called a taskQueue in CoreAdminHandler? There is no queueing happening here. Changed it. Had that change on my machine before you mentioned :) bq. Why is the taskQueue defined as a MapString, MapString, TaskObject? It can simply be a MapString, TaskObject. {quote} I don’t understand why we need three different maps for running/completed/failure for overseer collection processor. My comment #2 applies here too. We can store the status in the value bytes instead of keeping three different maps and moving the key around. What do you think? {quote} It takes away the ability (or atleast makes it too complicated) to limit number of tasks in a particular state e.g. limiting storage of 50 completed tasks only. bq. The CoreAdminHandler.addTask with limit=true just removes a random (first?) entry if the limit is reached. It removes the first element. Its a synchronized LinkedHashMap so the iterator preserves order and returns the first element. bq. OverseerCollectionProcessor.requestStatus returns response with “success” even if requestid is found in “running” or “failure” map Success was supposed to mean that the task was found in a status map. It might actually make sense to change it. Thanks for the suggestion. bq. Although there is a provision to clear the overseer status maps by passing requestid=1, it is never actually called. The intention is for the user to explicitly call the API. There's no concept of a map/queue in zk that maintains insertion state. you'd have to check it, order it and then delete the apt one every time the numChildren exceeds the limit. I thought it was best left to the user. Will upload a patch with the following: * Migrate API to also use the ASYNC CoreAdmin requests. * Store the failed tasks information from CoreAdmin async calls in case of Collection API requests. * Tests for ** migratekey (and other calls) in ASYNC mode. ** Failing Collection API calls. Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5477: --- Attachment: SOLR-5477.patch Fixed the following: * Changed the var name from Queue to Map. * Response structure from OCP async calls changed. Now it's: {code:xml} status state msg /status {code} Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882641#comment-13882641 ] Anshum Gupta edited comment on SOLR-5477 at 1/27/14 8:53 AM: - Fixed the following: * Changed the var name from Queue to Map. * Response structure from OCP async calls changed. Now it's: {code:xml} status staterunning|failed|completed|notfound/state msgapt message/msg /status {code} was (Author: anshumg): Fixed the following: * Changed the var name from Queue to Map. * Response structure from OCP async calls changed. Now it's: {code:xml} status state msg /status {code} Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1088: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1088/ 1 tests failed. REGRESSION: org.apache.solr.cloud.OverseerTest.testOverseerFailure Error Message: KeeperErrorCode = NodeExists for /collections/collection1/leaders/shard1 Stack Trace: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /collections/collection1/leaders/shard1 at __randomizedtesting.SeedInfo.seed([2BCA93FBB0E2264:6B426CCA9ABCD45]:0) at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:428) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:425) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:382) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:369) at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:112) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:164) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:108) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:156) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:289) at org.apache.solr.cloud.OverseerTest$MockZKController.publishState(OverseerTest.java:153) at org.apache.solr.cloud.OverseerTest.testOverseerFailure(OverseerTest.java:584) Build Log: [...truncated 52851 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:476: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:176: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/extra-targets.xml:77: Java returned: 1 Total time: 132 minutes 55 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Span Not Queries
Hi, Any news on this? On Fri, Jan 17, 2014 at 1:54 AM, Gopal Agarwal gopal.agarw...@gmail.comwrote: Sounds perfect. Hopefully one of the committer picks this up and adds this to 4.7. Will keep checking the updates... On Fri, Jan 17, 2014 at 1:17 AM, Allison, Timothy B. talli...@mitre.orgwrote: And don’t forget analysis! J The code is non-trivial, and it will take a generous committer to help me get it into shape for committing. Once I push my mods to jira (end of next week), you should be able to compile it and run it at least for dev/testing to confirm that it meets your needs. *From:* Gopal Agarwal [mailto:gopal.agarw...@gmail.com] *Sent:* Thursday, January 16, 2014 1:21 PM *To:* dev@lucene.apache.org *Subject:* Re: Span Not Queries Thanks Tim. This exactly fits my requirements of recursion, SpanNot and ComplexParser combination with Boolean Parser. Since I would end up doing the exact same changes to my QueryParserBase class, I would be locked with the current version of SOLR for forseeable future. Can you comment on when is the possible release if it gets reviewed by next week? On Thu, Jan 16, 2014 at 11:06 PM, Allison, Timothy B. talli...@mitre.org wrote: Apologies for the self-promotion…LUCENE-5205 and its Solr cousin (SOLR-5410) might help. I’m hoping to post updates to both by the end of next week. Then, if a committer would be willing to review and add these to Lucene/Solr, you should be good to go. Take a look at the description for LUCENE-5205and see if that capability will meet your needs. Thank you. Best, Tim *From:* Gopal Agarwal [mailto:gopal.agarw...@gmail.com] *Sent:* Thursday, January 16, 2014 4:10 AM *To:* dev@lucene.apache.org *Subject:* Fwd: Span Not Queries Please help me out with earlier query. In short: 1. Can we change the QueryParser.jj file to identify the SpanNot query as a boolean clause? 2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries also? For further explanation, following are the examples. On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar ankitthemight...@gmail.com wrote: *I have a business use case in which i need to use Span Not and other ordered proximity queries . And they can be nested upto any level A Boolean inside a ordered query or ordered query inside a Boolean . Currently i am thinking of changing the QuerParser.jj file to identify the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing complex queries . Can you suggest better way of achieving this.* *Following are the list of additions that i need to do in SOLR.* *1. Span NOT Operator* . 2.Adding Recursive and Range Proximity *Recursive Proximity *is a proximity query within a proximity query Ex: “ “income tax”~5 statement” ~4 The recursion can be up to any level. * Range Proximity*: Currently we can only define number as a range we want interval as a range . Ex: “profit income”~3,5, “United America”~-5,4 3. Complex Queries A complex query is a query formed with a combination of Boolean operators or proximity queries or range queries or any possible combination of these. Ex:“(income AND tax) statement”~4 “ “income tax”~4 (statement OR period) ”~3 (“ income” SPAN NOT “income tax” ) source ~3,5 Can anyone suggest us some way of achieving these 3 functionalities in SOLR ??? On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky j...@basetechnology.com wrote: Nope. But the LucidWorks Search product query parser does support SpanNot if you use their BEFORE, AFTER, and NEAR span operators. See: http://docs.lucidworks.com/**display/lweug/Proximity+**Operations http://docs.lucidworks.com/display/lweug/Proximity+Operations For example: George BEFORE:2 Bush NOT H to match George anything Bush, but not George H. W. Bush. What is your specific use case? -- Jack Krupansky -Original Message- From: Ankit Kumar Sent: Tuesday, October 15, 2013 3:58 AM To: solr-u...@lucene.apache.org Subject: Span Not Queries I need to add Span Not queries in solr . Ther's a parser Surround Query Parser i went through this ( http://lucene.472066.n3.**nabble.com/Surround-query-** parser-not-working-td4075066.**html http://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html ) to discover that surround query parser does not analyze text Does DisMaxQueryParser supports SpanNot Queries ??
[jira] [Updated] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5473: - Attachment: SOLR-5473.patch Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882719#comment-13882719 ] Jan Høydahl commented on SOLR-2366: --- bq. So for a facet.range.start=0, facet.range.end=1000, facet.range.gap=10,90,900 the labels would be as Jan suggests: [0 TO 10}, [10 TO 100}, [100 TO 1000}. [~tedsullivan], I am not in favor of a list of relative gaps, I think it is user unfriendly. That's why I suggested a new facet.range.spec or something like Hoss' facet.range.buckets. But if you for some reason wish to extend the gap parameter, I guess it needs to remain relative gaps since that is kind of implied in the wording? Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.7 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Benson Margulies as Lucene/Solr committer!
Welcome Benson! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 25. jan. 2014 kl. 22:40 skrev Michael McCandless luc...@mikemccandless.com: I'm pleased to announce that Benson Margulies has accepted to join our ranks as a committer. Benson has been involved in a number of Lucene/Solr issues over time (see http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies ), most recently on debugging tricky analysis issues. Benson, it is tradition that you introduce yourself with a brief bio. I know you're heavily involved in other Apache projects already... Once your account is set up, you should then be able to add yourself to the who we are page on the website as well. Congratulations and welcome! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882723#comment-13882723 ] Dana Sava commented on SOLR-4470: - Hello, We are currently using SOLR 4.5.1 in our production environment and we tried to setup security on a SOLR cloud configuration. I have read all the 4470 issue activity and it will be very useful for us to be able to download the SOLR-4470_branch_4x_r1452629.patch already compiled from some place, until the 4.7 version is released. Can somebody help me with this issue? Thank you, Dana Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 4.6.1 RC4
I guess this vote passed! On Sat, Jan 25, 2014 at 1:15 AM, Andi Vajda va...@osafoundation.org wrote: On Thu, 23 Jan 2014, Mark Miller wrote: Sorry - watch out for that link - I?m seeing the text correctly, but the underlying link incorrectly when I look at it in my send folder. The evils of html mail I guess. +1 PyLucene built from branch_4x's rev 1560866 passes all its tests. Andi.. To be sure you have the right artifacts, make sure you are looking at the following location: http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1560866/ - Mark On Jan 23, 2014, at 9:57 PM, Mark Miller markrmil...@gmail.com wrote: Here we go, hopefully for that last time now?thanks everyone for bearing with us. Please vote to release the following artifacts: http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1560866/ Here is my +1. SUCCESS! [0:56:37.409716] -- - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Is there any way to lucene index incremented indexing or updated
I had made index around 1 TB data. the problem is that i want to update or add more data in my lucene database . is there any way to add or re-index lucene Db ..Please give me some suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-any-way-to-lucene-index-incremented-indexing-or-updated-tp4113691.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Is there any way to lucene index incremented indexing or updated
Could you re-ask this on java-u...@lucene.apache.org? This list is for making changes to Lucene/Solr's source code ... thanks. Mike McCandless http://blog.mikemccandless.com On Mon, Jan 27, 2014 at 6:15 AM, mugeesh muge...@hitechpeople.in wrote: I had made index around 1 TB data. the problem is that i want to update or add more data in my lucene database . is there any way to add or re-index lucene Db ..Please give me some suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-any-way-to-lucene-index-incremented-indexing-or-updated-tp4113691.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters
Michael McCandless created LUCENE-5418: -- Summary: Don't use .advance on costly (e.g. distance range facets) filters Key: LUCENE-5418 URL: https://issues.apache.org/jira/browse/LUCENE-5418 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.7 If you use a distance filter today (see http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html ), then drill down on one of those ranges, under the hood Lucene is using .advance on the Filter, which is very costly because we end up computing distance on (possibly many) hits that don't match the query. It's better performance to find the hits matching the Query first, and then check the filter. FilteredQuery can already do this today, when you use its QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as Solr's post filters (I think?) but with a far simpler/better/less code approach. E.g., I believe ElasticSearch uses this API when it applies costly filters. Longish term, I think Query/Filter ought to know itself that it's expensive, and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's passed to IndexSearcher.search, we should also be smart here and not call .advance on such clauses. But that'd be a biggish change ... so for today the workaround is the user must carefully construct the FilteredQuery themselves. In the mean time, as another workaround, I want to fix DrillSideways so that when you drill down on such filters it doesn't use .advance; this should give a good speedup for the normal path API usage with a costly filter. I'm iterating on the lucene server branch (LUCENE-5376) but once it's working I plan to merge this back to trunk / 4.7. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Liberating DirectPostingFormat from Codec
What do we have for a benchmark framework that is used to justify/qualify speed-related things? One way forward would be to see what a quantified measurement shows from the idea I have in mind, and use that to facilitate deciding if this belongs in the tree. On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies bimargul...@gmail.com wrote: Keeping things in memory and not re-reading them from disk is what really sang the song for us. Even if the initial read-in was more costly due to decompression, the long-term amortized benefit of not re-reading would still be a big winner. On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir rcm...@gmail.com wrote: well the Directory layer likely isnt what probably makes DirectPF faster for you. Its probably the fact it does no compression at all... On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies bimargul...@gmail.com wrote: On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir rcm...@gmail.com wrote: That would be Directory :) Oh, how embarrassing. I could have written a custom directory to begin with. Would a Directory class for this purpose be an interesting patch, in that case? I'm not discontented about building a Directory into our application, but it seems like I might not be the only person to find this useful. On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies bimargul...@gmail.com wrote: I've had very gratifying results using the DirectPostingFormat to speed up queries when I had a read-only index with plenty of memory. The only downside was the need to specify it within the Codec, and thus write it into the index. Ever since, I've wondered if we could change things to introduce the same goodness without building it into the codec. Very roughly, I'm imagining an option in the IndexReader to provide an object that can surround the codec that is called for in the stored format. Is this an old question? Is it worth sketching a patch? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Liberating DirectPostingFormat from Codec
Hi Benson, I use the code from luceneutil (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I run those scripts nightly for the nightly benchmarks: http://people.apache.org/~mikemccand/lucenebench But, that's the Wikipedia corpus, and has no real queries, and the scripts are quite challenging to get working ... if you have access to more realistic corpus + queries, even if you can't share it, those results are also interesting to share. I think it would be neat if an app could retroactively pick DirectPF at search time, or more generally pass search-time parameters when initializing codec components (I think there was a discussion about this at some point but I can't remember what the use case was). Today, any and all choices must be written into the index and cannot be changed at search time, which is somewhat silly/restrictive for DirectPF since it can wrap any other PF and act as simply a fast cache on top of the postings. Mike McCandless http://blog.mikemccandless.com On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies bimargul...@gmail.com wrote: What do we have for a benchmark framework that is used to justify/qualify speed-related things? One way forward would be to see what a quantified measurement shows from the idea I have in mind, and use that to facilitate deciding if this belongs in the tree. On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies bimargul...@gmail.com wrote: Keeping things in memory and not re-reading them from disk is what really sang the song for us. Even if the initial read-in was more costly due to decompression, the long-term amortized benefit of not re-reading would still be a big winner. On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir rcm...@gmail.com wrote: well the Directory layer likely isnt what probably makes DirectPF faster for you. Its probably the fact it does no compression at all... On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies bimargul...@gmail.com wrote: On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir rcm...@gmail.com wrote: That would be Directory :) Oh, how embarrassing. I could have written a custom directory to begin with. Would a Directory class for this purpose be an interesting patch, in that case? I'm not discontented about building a Directory into our application, but it seems like I might not be the only person to find this useful. On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies bimargul...@gmail.com wrote: I've had very gratifying results using the DirectPostingFormat to speed up queries when I had a read-only index with plenty of memory. The only downside was the need to specify it within the Codec, and thus write it into the index. Ever since, I've wondered if we could change things to introduce the same goodness without building it into the codec. Very roughly, I'm imagining an option in the IndexReader to provide an object that can surround the codec that is called for in the stored format. Is this an old question? Is it worth sketching a patch? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Liberating DirectPostingFormat from Codec
On Mon, Jan 27, 2014 at 7:12 AM, Michael McCandless luc...@mikemccandless.com wrote: Hi Benson, I use the code from luceneutil (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I run those scripts nightly for the nightly benchmarks: http://people.apache.org/~mikemccand/lucenebench But, that's the Wikipedia corpus, and has no real queries, and the scripts are quite challenging to get working ... if you have access to more realistic corpus + queries, even if you can't share it, those results are also interesting to share. I think it would be neat if an app could retroactively pick DirectPF at search time, or more generally pass search-time parameters when initializing codec components (I think there was a discussion about this at some point but I can't remember what the use case was). Today, any and all choices must be written into the index and cannot be changed at search time, which is somewhat silly/restrictive for DirectPF since it can wrap any other PF and act as simply a fast cache on top of the postings. Well, that's where I thought I was starting: an API into the reader that allows DirectPF to be injected as a wrapper around others. I haven't had time to follow Rob's bread-crumb trail to see if this is straightforward by customizing Directory -- thought it occurs to me that we have many directories, and it would useful to be able to do this regardless. I may be able to share a data set, I'll check into that today. Mike McCandless http://blog.mikemccandless.com On Mon, Jan 27, 2014 at 7:06 AM, Benson Margulies bimargul...@gmail.com wrote: What do we have for a benchmark framework that is used to justify/qualify speed-related things? One way forward would be to see what a quantified measurement shows from the idea I have in mind, and use that to facilitate deciding if this belongs in the tree. On Sat, Jan 25, 2014 at 6:34 PM, Benson Margulies bimargul...@gmail.com wrote: Keeping things in memory and not re-reading them from disk is what really sang the song for us. Even if the initial read-in was more costly due to decompression, the long-term amortized benefit of not re-reading would still be a big winner. On Sat, Jan 25, 2014 at 5:37 PM, Robert Muir rcm...@gmail.com wrote: well the Directory layer likely isnt what probably makes DirectPF faster for you. Its probably the fact it does no compression at all... On Sat, Jan 25, 2014 at 5:34 PM, Benson Margulies bimargul...@gmail.com wrote: On Sat, Jan 25, 2014 at 5:09 PM, Robert Muir rcm...@gmail.com wrote: That would be Directory :) Oh, how embarrassing. I could have written a custom directory to begin with. Would a Directory class for this purpose be an interesting patch, in that case? I'm not discontented about building a Directory into our application, but it seems like I might not be the only person to find this useful. On Sat, Jan 25, 2014 at 5:03 PM, Benson Margulies bimargul...@gmail.com wrote: I've had very gratifying results using the DirectPostingFormat to speed up queries when I had a read-only index with plenty of memory. The only downside was the need to specify it within the Codec, and thus write it into the index. Ever since, I've wondered if we could change things to introduce the same goodness without building it into the codec. Very roughly, I'm imagining an option in the IndexReader to provide an object that can surround the codec that is called for in the stored format. Is this an old question? Is it worth sketching a patch? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Benson Margulies as Lucene/Solr committer!
Welcome Benson :) On Monday, January 27, 2014 at 10:57 AM, Jan Høydahl wrote: Welcome Benson! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com (http://www.cominvent.com) 25. jan. 2014 kl. 22:40 skrev Michael McCandless luc...@mikemccandless.com (mailto:luc...@mikemccandless.com): I'm pleased to announce that Benson Margulies has accepted to join our ranks as a committer. Benson has been involved in a number of Lucene/Solr issues over time (see http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies ), most recently on debugging tricky analysis issues. Benson, it is tradition that you introduce yourself with a brief bio. I know you're heavily involved in other Apache projects already... Once your account is set up, you should then be able to add yourself to the who we are page on the website as well. Congratulations and welcome! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org (mailto:dev-unsubscr...@lucene.apache.org) For additional commands, e-mail: dev-h...@lucene.apache.org (mailto:dev-h...@lucene.apache.org) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org (mailto:dev-unsubscr...@lucene.apache.org) For additional commands, e-mail: dev-h...@lucene.apache.org (mailto:dev-h...@lucene.apache.org)
Re: Liberating DirectPostingFormat from Codec
On Mon, Jan 27, 2014 at 7:23 AM, Benson Margulies bimargul...@gmail.com wrote: On Mon, Jan 27, 2014 at 7:12 AM, Michael McCandless luc...@mikemccandless.com wrote: Hi Benson, I use the code from luceneutil (https://code.google.com/a/apache-extras.org/p/luceneutil/ ), e.g. I run those scripts nightly for the nightly benchmarks: http://people.apache.org/~mikemccand/lucenebench But, that's the Wikipedia corpus, and has no real queries, and the scripts are quite challenging to get working ... if you have access to more realistic corpus + queries, even if you can't share it, those results are also interesting to share. I think it would be neat if an app could retroactively pick DirectPF at search time, or more generally pass search-time parameters when initializing codec components (I think there was a discussion about this at some point but I can't remember what the use case was). Today, any and all choices must be written into the index and cannot be changed at search time, which is somewhat silly/restrictive for DirectPF since it can wrap any other PF and act as simply a fast cache on top of the postings. Well, that's where I thought I was starting: an API into the reader that allows DirectPF to be injected as a wrapper around others. I haven't had time to follow Rob's bread-crumb trail to see if this is straightforward by customizing Directory -- thought it occurs to me that we have many directories, and it would useful to be able to do this regardless. I'm not sure how a custom Directory applies here ... maybe Rob can clarify? I may be able to share a data set, I'll check into that today. Cool! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Areek Zillur as Lucene/Solr committer!
Welcome Areek :) On Tuesday, January 21, 2014 at 8:26 PM, Robert Muir wrote: I'm pleased to announce that Areek Zillur has accepted to join our ranks as a committer. Areek has been improving suggester support in Lucene and Solr, including a revamped Solr component slated for the 4.7 release. [1] Areek, it is tradition that you introduce yourself with a brief bio. Once your account is setup, you should then be able to add yourself to the who we are page on the website as well. Congratulations and welcome! [1] https://issues.apache.org/jira/browse/SOLR-5378
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Attachment: SOLR-4787.patch Resolved a memory leak when the bjoin is used with cache autowarming. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr webapplication. This will
Jetty version should go in CHANGES.TXT
Hi, I'd argue that Jetty can be said to be a major component of Solr, so I suggest we add Jetty version under the section Versions of Major Components in Solr's CHANGES.TXT ? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 4.6.1 RC4
Thanks everyone for voting. It’s been 72 hours, the vote has passed. - Mark http://about.me/markrmiller On Jan 23, 2014, at 9:57 PM, Mark Miller markrmil...@gmail.com wrote: Here we go, hopefully for that last time now…thanks everyone for bearing with us. Please vote to release the following artifacts: http://people.apache.org/~markrmiller/lucene_solr_4_6_1r1560866/ Here is my +1. SUCCESS! [0:56:37.409716] -- - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.
[ https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dorin Oltean updated SOLR-5669: --- Description: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}\\ujb{quote} wich in fact leads to a different query in solr. I use edismax qparser. was: When I do the following query: /select?q=\ujb I get org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, code 400 To make it work i have to put in fornt of the query nother '\' \\ujb wich in fact leads to a different query in solr. I use edismax qparser. queries containing \u return error: Truncated unicode escape sequence. - Key: SOLR-5669 URL: https://issues.apache.org/jira/browse/SOLR-5669 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.4 Reporter: Dorin Oltean Priority: Minor When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}\\ujb{quote} wich in fact leads to a different query in solr. I use edismax qparser. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.
Dorin Oltean created SOLR-5669: -- Summary: queries containing \u return error: Truncated unicode escape sequence. Key: SOLR-5669 URL: https://issues.apache.org/jira/browse/SOLR-5669 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.4 Reporter: Dorin Oltean Priority: Minor When I do the following query: /select?q=\ujb I get org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, code 400 To make it work i have to put in fornt of the query nother '\' \\ujb wich in fact leads to a different query in solr. I use edismax qparser. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.
[ https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dorin Oltean updated SOLR-5669: --- Description: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {noformat}\\ujb{noformat} wich in fact leads to a different query in solr. I use edismax qparser. was: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}{noformat}\\ujb{noformat}{quote} wich in fact leads to a different query in solr. I use edismax qparser. queries containing \u return error: Truncated unicode escape sequence. - Key: SOLR-5669 URL: https://issues.apache.org/jira/browse/SOLR-5669 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.4 Reporter: Dorin Oltean Priority: Minor When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {noformat}\\ujb{noformat} wich in fact leads to a different query in solr. I use edismax qparser. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.
[ https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dorin Oltean updated SOLR-5669: --- Description: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}{noformat}\\ujb{noformat}{quote} wich in fact leads to a different query in solr. I use edismax qparser. was: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}\ \ujb{quote} wich in fact leads to a different query in solr. I use edismax qparser. queries containing \u return error: Truncated unicode escape sequence. - Key: SOLR-5669 URL: https://issues.apache.org/jira/browse/SOLR-5669 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.4 Reporter: Dorin Oltean Priority: Minor When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}{noformat}\\ujb{noformat}{quote} wich in fact leads to a different query in solr. I use edismax qparser. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.
[ https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dorin Oltean updated SOLR-5669: --- Description: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}\ \ujb{quote} wich in fact leads to a different query in solr. I use edismax qparser. was: When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}\\ujb{quote} wich in fact leads to a different query in solr. I use edismax qparser. queries containing \u return error: Truncated unicode escape sequence. - Key: SOLR-5669 URL: https://issues.apache.org/jira/browse/SOLR-5669 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.4 Reporter: Dorin Oltean Priority: Minor When I do the following query: /select?q=\ujb I get {quote} org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: j, {quote} To make it work i have to put in fornt of the query nother '\' {quote}\ \ujb{quote} wich in fact leads to a different query in solr. I use edismax qparser. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5670) _version_ either indexed OR docvalue
Per Steffensen created SOLR-5670: Summary: _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-5670: - Attachment: SOLR-5670.patch Simple patch attached. No testes of it added, but I have seen it working locally. _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Attachments: SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882811#comment-13882811 ] Per Steffensen edited comment on SOLR-5670 at 1/27/14 3:38 PM: --- Simple patch attached. No testes of it added, but I have seen it working locally. 4.4.0 test-suite is green with this change. Do not know if branch_4x test-suite is. was (Author: steff1193): Simple patch attached. No testes of it added, but I have seen it working locally. _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Attachments: SOLR-5670.patch, SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-5670: --- Attachment: SOLR-5670.patch From a design perspective, I can't claim to know whether this is an acceptable patch or not. Consistency in configs across multiple users and multiple versions does have some value, which is a very minor argument against this change. Is there any benchmark data? If docValues provides better performance for _version_ than indexed when it is used for its intended purpose, it might be worth changing the example config ... but people should know that if they *do* change the config on this field, they will have to completely reindex. This patch is functionally identical to the previous one, it just modifies an error message. I didn't check to see what branch Per's patch was created on, but it did apply cleanly to branch_4x. This patch is against that branch. _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Attachments: SOLR-5670.patch, SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882899#comment-13882899 ] Shawn Heisey edited comment on SOLR-5670 at 1/27/14 3:41 PM: - From a design perspective, I can't claim to know whether this is an acceptable patch or not. Consistency in configs across multiple users and multiple versions does have some value, which is a very minor argument against this change. Is there any benchmark data? If docValues provides better performance for \_version\_ than indexed when it is used for its intended purpose, it might be worth changing the example config ... but people should know that if they *do* change the config on this field, they will have to completely reindex. This patch is functionally identical to the previous one, it just modifies an error message. I didn't check to see what branch Per's patch was created on, but it did apply cleanly to branch_4x. This patch is against that branch. was (Author: elyograg): From a design perspective, I can't claim to know whether this is an acceptable patch or not. Consistency in configs across multiple users and multiple versions does have some value, which is a very minor argument against this change. Is there any benchmark data? If docValues provides better performance for _version_ than indexed when it is used for its intended purpose, it might be worth changing the example config ... but people should know that if they *do* change the config on this field, they will have to completely reindex. This patch is functionally identical to the previous one, it just modifies an error message. I didn't check to see what branch Per's patch was created on, but it did apply cleanly to branch_4x. This patch is against that branch. _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Attachments: SOLR-5670.patch, SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Updated] (SOLR-4787) Join Contrib
does this also applicable for the hjoin? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Joel Bernstein updated SOLR-4787: - Attachment: SOLR-4787.patch Resolved a memory leak when the bjoin is used with cache autowarming. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside
[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added
[ https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882849#comment-13882849 ] Erik Hatcher commented on SOLR-5658: [~markmil...@gmail.com] Is this ticket complete as of Solr 4.6.1? Just wondering if it can be closed. Thanks! commitWithin does not reflect the new documents added - Key: SOLR-5658 URL: https://issues.apache.org/jira/browse/SOLR-5658 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0 Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5658.patch, SOLR-5658.patch I start 4 nodes using the setup mentioned on - https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I added a document using - curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit with openSearcher=false In Solr 4.6.x there is there is only one commit hard commit with openSearcher=false So even after 10 seconds queries on none of the shards reflect the added document. This was also reported on the solr-user list ( http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html ) Here are the relevant logs Logs from Solr 4.5.1 Node 1: {code} 420021 [qtp619011445-12] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45 {code} Node 2: {code} 119896 [qtp1608701025-10] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2} {add=[testdoc (1458003295513608192)]} 0 348 129648 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 129679 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@e174f70 main 129680 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – [collection1] Registered new searcher Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 134648 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3} commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 4 134660 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush {code} Node 3: Node 4: {code} 374545 [qtp1608701025-16] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[testdoc (1458002133233172480)]} 0 20 384545 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 384552 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@36137e08 main 384553 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 384553
Re: [jira] [Updated] (SOLR-4787) Join Contrib
Kranti, The memory leak in the bjoin dealt with the multi-value field joins. Specifically how the new UninvertedIntField cache was used in the bjoin. In a quick review of the hjoin I'm not seeing the same issue but it would be good to confirm through testing. Joel Joel Bernstein Search Engineer at Heliosearch On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa kranti.par...@gmail.comwrote: does this also applicable for the hjoin? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Joel Bernstein updated SOLR-4787: - Attachment: SOLR-4787.patch Resolved a memory leak when the bjoin is used with cache autowarming. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is
[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected
[ https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882884#comment-13882884 ] ASF subversion and git services commented on SOLR-5671: --- Commit 1561711 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1561711 ] SOLR-5671: increase logging to try and track down test failure (merged trunk r1561709) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected --- Key: SOLR-5671 URL: https://issues.apache.org/jira/browse/SOLR-5671 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Rowe Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Benson Margulies as Lucene/Solr committer!
Congratulations Benson! Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Jan 27, 2014 at 9:32 AM, Alan Woodward a...@flax.co.uk wrote: Congratulations and welcome, Benson! Alan Woodward www.flax.co.uk On 26 Jan 2014, at 17:43, Shawn Heisey wrote: On 1/25/2014 2:40 PM, Michael McCandless wrote: I'm pleased to announce that Benson Margulies has accepted to join our ranks as a committer. Benson has been involved in a number of Lucene/Solr issues over time (see http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies ), most recently on debugging tricky analysis issues. Congratulations and welcome! One more to try and keep me in line. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5414) suggest module should not depend on expression module
[ https://issues.apache.org/jira/browse/LUCENE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882879#comment-13882879 ] ASF subversion and git services commented on LUCENE-5414: - Commit 1561708 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1561708 ] LUCENE-5414: intellij config (merged trunk r1561707) suggest module should not depend on expression module - Key: LUCENE-5414 URL: https://issues.apache.org/jira/browse/LUCENE-5414 Project: Lucene - Core Issue Type: Wish Affects Versions: 4.6, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 5.0, 4.7 Attachments: LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch Currently our suggest module depends on the expression module just because the DocumentExpressionDictionary provides some util ctor to pass in an expression directly. That is a lot of dependency for little value IMO and pulls in lots of JARs. DocumentExpressionDictionary should only take a ValueSource instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Benson Margulies as Lucene/Solr committer!
Congratulations and welcome, Benson! Alan Woodward www.flax.co.uk On 26 Jan 2014, at 17:43, Shawn Heisey wrote: On 1/25/2014 2:40 PM, Michael McCandless wrote: I'm pleased to announce that Benson Margulies has accepted to join our ranks as a committer. Benson has been involved in a number of Lucene/Solr issues over time (see http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies ), most recently on debugging tricky analysis issues. Congratulations and welcome! One more to try and keep me in line. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected
[ https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882881#comment-13882881 ] ASF subversion and git services commented on SOLR-5671: --- Commit 1561709 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1561709 ] SOLR-5671: increase logging to try and track down test failure Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected --- Key: SOLR-5671 URL: https://issues.apache.org/jira/browse/SOLR-5671 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Rowe Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Jetty version should go in CHANGES.TXT
+1 koji -- http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html (14/01/27 21:44), Jan Høydahl wrote: Hi, I'd argue that Jetty can be said to be a major component of Solr, so I suggest we add Jetty version under the section Versions of Major Components in Solr's CHANGES.TXT ? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-5652: - Description: Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) was: Twice now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Updated] (SOLR-4787) Join Contrib
Thanks Joel. I shall look into that. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Jan 27, 2014 at 10:19 AM, Joel Bernstein joels...@gmail.com wrote: Kranti, The memory leak in the bjoin dealt with the multi-value field joins. Specifically how the new UninvertedIntField cache was used in the bjoin. In a quick review of the hjoin I'm not seeing the same issue but it would be good to confirm through testing. Joel Joel Bernstein Search Engineer at Heliosearch On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa kranti.par...@gmail.comwrote: does this also applicable for the hjoin? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Joel Bernstein updated SOLR-4787: - Attachment: SOLR-4787.patch Resolved a memory leak when the bjoin is used with cache autowarming. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query
[jira] [Commented] (LUCENE-5416) Performance of a FixedBitSet variant that uses Long.numberOfTrailingZeros()
[ https://issues.apache.org/jira/browse/LUCENE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882919#comment-13882919 ] Paul Elschot commented on LUCENE-5416: -- The last benchmark output is here: https://github.com/PaulElschot/lucene-solr/commit/772b55ad3c3d94752b37aa81b2e96cb50b321cf6 , see from line 313 in this output, the comparisons and loads are given in 10log numbers. In short: - for advance() this is a factor of 1.7 to 4 times faster, and - for nextDoc() this is up to 2.5 times faster, but for load factors higher than about 0.25 it is up to about 5 times slower. Performance of a FixedBitSet variant that uses Long.numberOfTrailingZeros() --- Key: LUCENE-5416 URL: https://issues.apache.org/jira/browse/LUCENE-5416 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.0 Reporter: Paul Elschot Priority: Minor Fix For: 5.0 On my machine the current byte index used in OpenBitSetIterator is slower than Long.numberOfTrailingZeros() for advance(). The pull request contains the code for benchmarking this taken from an early stage of DocBlocksIterator. In case the benchmark shows improvements on more machines, well, we know what to do... -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5414) suggest module should not depend on expression module
[ https://issues.apache.org/jira/browse/LUCENE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882873#comment-13882873 ] ASF subversion and git services commented on LUCENE-5414: - Commit 1561707 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1561707 ] LUCENE-5414: intellij config suggest module should not depend on expression module - Key: LUCENE-5414 URL: https://issues.apache.org/jira/browse/LUCENE-5414 Project: Lucene - Core Issue Type: Wish Affects Versions: 4.6, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 5.0, 4.7 Attachments: LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch Currently our suggest module depends on the expression module just because the DocumentExpressionDictionary provides some util ctor to pass in an expression directly. That is a lot of dependency for little value IMO and pulls in lots of JARs. DocumentExpressionDictionary should only take a ValueSource instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected
Steve Rowe created SOLR-5671: Summary: Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected Key: SOLR-5671 URL: https://issues.apache.org/jira/browse/SOLR-5671 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Rowe Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected
[ https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882885#comment-13882885 ] Steve Rowe commented on SOLR-5671: -- I committed a change to DistribCursorPagingTest that will print the details of the indexed doc(s) not returned by deep paging. Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected --- Key: SOLR-5671 URL: https://issues.apache.org/jira/browse/SOLR-5671 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Rowe Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882890#comment-13882890 ] Ted Sullivan commented on SOLR-2366: Right. I'm following with [~shalinmangar] suggestion to split out your/Hoss's facet.range.spec / facet.sequence idea as a separate issue. I don't think of this as extending the gap parameter - I am just providing more explicit information in the response as to what gaps you actually get (as per your suggestion of Sept/2011) - similar to what you would get if you implemented this using facet.query. Looking at the current code, it is pretty easy to add the range information to the response (right now the response labels are just the gap starts). This may be user-unfriendly as you say, but I would argue that it is more friendly than what we have right now - it is certainly more developer-friendly because it provides better feedback. There is a lot of interest in this feature (it has been advertised on the SimpleFacetsParameter Wiki for some time now) as evidenced by earlier comments in this thread. My original desire was just to make (the patch) usable for those that want to use it by upgrading Grant's original patch so that it would work with the new(?) modular class organization. The work required to spiff up the facet.range.gap response is not large. I haven't impacted the facet.range.spec/buckets approach but that would seem to require more effort. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 4.7 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882933#comment-13882933 ] Per Steffensen commented on SOLR-5670: -- bq. Is there any benchmark data? If docValues provides better performance for _version_ than indexed I do not think it will in most cases. * Indexed: When you want to get the _version_ for a particular doc-no (found by id), you can make a lookup in FieldCache holding the reversed term-index - this is in memory and constant time. If you have a very rapidly changing data-set (so that FieldCache-entries will be invalidated often due to merging) you might get better performance (response-time) with doc-values - but not in general, I think. * DocValues: You will read the _version_ from doc-values which in not necessarily in memory We are prepared to take a small performance hit, to avoid having all that data in FieldCache. In general we do not allow putting anything in FieldCache, because we have so many documents, that is always creates issues with too much memory usage. The problem with FieldCache is that it is all or nothing - for a good reasons! - we just cannot live with it. We havnt made the change on _version_ (going from indexed to doc-value) in production yet. We will do some performance testing on it first, and depending on how much we decide to do, I can get back with some numbers. bq. when it is used for its intended purpose, it might be worth changing the example config Do not think you should do that. Using FieldCache is probably the best default. But writing something somewhere about the option of using doc-values instead of indexed, and when that is a good idea, would be nice. bq. ... but people should know that if they do change the config on this field, they will have to completely reindex. Or just start using it from now on in new collections. We create a new collection every month and keep a history of data by keeping the latest 24 collections. One of many reasons for doing this, is that it provides us the option of changing indexing-strategy etc every month. For us re-indexing is completely out of the question - we have billions and billions of records in Solr and re-indexing them all in a fairly short service-window is not possible. Therefore we built this new-collection-every-month thingy in order to have some flexibility. bq. This patch is functionally identical to the previous one, it just modifies an error message. Nicely spotted bq. I didn't check to see what branch Per's patch was created on, but it did apply cleanly to branch_4x. It was branch_4x _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Attachments: SOLR-5670.patch, SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Welcome Benson Margulies as Lucene/Solr committer!
Welcome! On Jan 25, 2014 1:41 PM, Michael McCandless luc...@mikemccandless.comjavascript:_e({}, 'cvml', 'luc...@mikemccandless.com'); wrote: I'm pleased to announce that Benson Margulies has accepted to join our ranks as a committer. Benson has been involved in a number of Lucene/Solr issues over time (see http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies ), most recently on debugging tricky analysis issues. Benson, it is tradition that you introduce yourself with a brief bio. I know you're heavily involved in other Apache projects already... Once your account is set up, you should then be able to add yourself to the who we are page on the website as well. Congratulations and welcome! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.orgjavascript:_e({}, 'cvml', 'dev-unsubscr...@lucene.apache.org'); For additional commands, e-mail: dev-h...@lucene.apache.orgjavascript:_e({}, 'cvml', 'dev-h...@lucene.apache.org');
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882832#comment-13882832 ] Per Steffensen commented on SOLR-4470: -- bq. We are currently using SOLR 4.5.1 in our production environment and we tried to setup security on a SOLR cloud configuration. Container managed authentication and authorization I presume? bq. I have read all the 4470 issue activity and it will be very useful for us to be able to download the SOLR-4470_branch_4x_r1452629.patch already compiled from some place, until the 4.7 version is released. Guess you are looking at Fix Version/s: 4.7 on this issue, and expect that this means that the fix will be in 4.7. I do not believe it will - unfortunately. So if you want the feature, you need to change the patch yourself to fit the version of Solr you are using, or you can download code for Solr 4.4 plus numerous improvements (including SOLR-4470) here: https://github.com/steff1193/lucene-solr. You will have to build a Solr distribution yourself - and maven artifacts if you need those * Building distribution from source {code} checkout cd solr ant -Dversion=4.4.0.myversion clean create-package {code} * Building and deploying artifacts is a little more complicated. Let me know if you need that. *Please note* that https://github.com/steff1193/lucene-solr is only a place where we keep our version of Lucene/Solr, including the changes we have made which has not yet been committed in Apache Solr regi. You are free to use it, but there is no guarantee that there will ever be a version based on a Apache Solr version higher than 4.4. It is very likely that there will be, but no guarantee and you never know when it will happen. Of course it is all open source so if you really want you can fork it yourself. Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5658) commitWithin does not reflect the new documents added
[ https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-5658. --- Resolution: Fixed commitWithin does not reflect the new documents added - Key: SOLR-5658 URL: https://issues.apache.org/jira/browse/SOLR-5658 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0 Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5658.patch, SOLR-5658.patch I start 4 nodes using the setup mentioned on - https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I added a document using - curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit with openSearcher=false In Solr 4.6.x there is there is only one commit hard commit with openSearcher=false So even after 10 seconds queries on none of the shards reflect the added document. This was also reported on the solr-user list ( http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html ) Here are the relevant logs Logs from Solr 4.5.1 Node 1: {code} 420021 [qtp619011445-12] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45 {code} Node 2: {code} 119896 [qtp1608701025-10] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2} {add=[testdoc (1458003295513608192)]} 0 348 129648 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 129679 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@e174f70 main 129680 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – [collection1] Registered new searcher Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 134648 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3} commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 4 134660 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush {code} Node 3: Node 4: {code} 374545 [qtp1608701025-16] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[testdoc (1458002133233172480)]} 0 20 384545 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 384552 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@36137e08 main 384553 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 384553 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@36137e08
[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-5652: - Description: Twice now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) was: Twice now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far (in 3 failures): * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far only seen on MacOSX * So far only seen on branch 4x * So far seen on both Java6 and Java7 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were both when doing a desc sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** sort on same field asc has always worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) Updated summary Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Twice now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-797) Construct EmbeddedSolrServer response without serializing/parsing
[ https://issues.apache.org/jira/browse/SOLR-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882944#comment-13882944 ] Gregg Donovan commented on SOLR-797: I'm interested in this as well. We had an custom API that was similar to the attached patch. When we switched to EmbeddedSolrServer we noticed an increase in time spent deserializing the Solr response, memory allocated, and GC spikiness. One issue with the current EmbeddedSolrServer code is that it starts with ByteArrayOutputStream of 32 bytes and resizes repeatedly it to fit the results. We have large responses and we notice the GC hit. We experimented with a ThreadLocalByteBuffer, but avoiding serializing and parsing altogether for EmbeddedSolrServer seems like an even better idea. If there's interest, we'd be happy to revive/update/test this patch. Construct EmbeddedSolrServer response without serializing/parsing - Key: SOLR-797 URL: https://issues.apache.org/jira/browse/SOLR-797 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Reporter: Jonathan Lee Priority: Minor Fix For: 4.7 Attachments: SOLR-797.patch, SOLR-797.patch Currently, the EmbeddedSolrServer serializes the response and reparses in order to create the final NamedList response. From the comment in EmbeddedSolrServer.java, the goal is to: * convert the response directly into a named list -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882958#comment-13882958 ] Steve Rowe commented on SOLR-5652: -- bq. It looks to me like there are two problems here: 1) the same doc is showing up on different pages when deep paging; and 2) missing docvalue docs are sorted incorrectly. I think I understand problem #2: non-multi-valued numeric and string fields are created (by TrieField's and StrField's createFields() methods) as NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these require each doc to have a value, which apparently defaults to zero for NumericDocValuesField-s and the empty string for SortedDocValueField-s. Here are the declarations for the field types that have this problem in DistribCursorPagingTest (from schema-sorts.xml): {code:xml} fieldtype name=str_dv_last class=solr.StrField stored=true indexed=false docValues=true sortMissingLast=true/ fieldtype name=str_dv_first class=solr.StrField stored=true indexed=false docValues=true sortMissingFirst=true/ fieldtype name=int_dv_last class=solr.TrieIntField stored=true indexed=false docValues=true sortMissingLast=true/ fieldtype name=int_dv_first class=solr.TrieIntField stored=true indexed=false docValues=true sortMissingFirst=true/ fieldtype name=long_dv_last class=solr.TrieLongField stored=true indexed=false docValues=true sortMissingLast=true/ fieldtype name=long_dv_first class=solr.TrieLongField stored=true indexed=false docValues=true sortMissingFirst=true/ fieldtype name=float_dv_last class=solr.TrieFloatField stored=true indexed=false docValues=true sortMissingLast=true/ fieldtype name=float_dv_first class=solr.TrieFloatField stored=true indexed=false docValues=true sortMissingFirst=true/ fieldtype name=double_dv_last class=solr.TrieDoubleField stored=true indexed=false docValues=true sortMissingLast=true/ fieldtype name=double_dv_first class=solr.TrieDoubleField stored=true indexed=false docValues=true sortMissingFirst=true/ {code} I think that the above declarations should by disallowed by Solr, because they contain docValues=true + sortMissingLast|First=true; the user is asking for a particular sorting behavior for missing values, when there never will be missing values. Also, the Solr Ref Guide [says|https://cwiki.apache.org/confluence/display/solr/DocValues] about docvalue fields If this type is used, the field must be either required or have a default value, meaning every document must have a value for this field. However, neither the above field types nor the fields using them are required or have a default specified. Maybe this should be enforced by schema parsing? Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5666) Using the hdfs write cache can result in appearance of corrupted index.
[ https://issues.apache.org/jira/browse/SOLR-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882963#comment-13882963 ] ASF subversion and git services commented on SOLR-5666: --- Commit 1561751 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1561751 ] SOLR-5666: Using the hdfs write cache can result in appearance of corrupted index. Using the hdfs write cache can result in appearance of corrupted index. --- Key: SOLR-5666 URL: https://issues.apache.org/jira/browse/SOLR-5666 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, 4.7 -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5666) Using the hdfs write cache can result in appearance of corrupted index.
[ https://issues.apache.org/jira/browse/SOLR-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882964#comment-13882964 ] ASF subversion and git services commented on SOLR-5666: --- Commit 1561752 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1561752 ] SOLR-5666: Using the hdfs write cache can result in appearance of corrupted index. Using the hdfs write cache can result in appearance of corrupted index. --- Key: SOLR-5666 URL: https://issues.apache.org/jira/browse/SOLR-5666 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0, 4.7 -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882976#comment-13882976 ] Yonik Seeley commented on SOLR-5652: bq. NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these require each doc to have a value, Although that used to be true, it should no longer be the case: LUCENE-5178 Now one thing that does look a little fishy to me that might cause a problem is how things like IntComparator deals with missing values... it simply substitutes in MAX_INT or MIN_INT when the value is missing. If the tests here are generating random values, you might try taking out MAX_numeric_type, MIN_numeric_type and see if it makes a difference? Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883010#comment-13883010 ] David Webster commented on SOLR-4470: - I have to admit I find the content of this issue to be disturbing coming from such a major Open Source project as Solr. I came here looking for a viable security solution that did not involve segmenting off the system or otherwise using IPsec and other IP-address centric forms of security. For most truly Enterprise worthy solutions the products, themselves simply must address security, internally, to ever be considered truly Enterprise worth solutions. This product does not, and even worse, the core Dev team seems intent on NEVER doing so! As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern. I don't care how fast a product is, or how many slick features it has, if it isn't secure, at the core, it is worthless as an Enterprise solution (at least for any Enterprise that gives a whit about REAL security). Solr is doomed to use as a lab experiment for any serious Enterprise implementation where security is more than an afterthought. I like Solr. I like what it does and how it does it. However, it's lack of internal security hooks is a complete show stopper for use at my firm. So my choices are to internalize the code, using this patch as our starting point, and have our own Solr-like engine, or move on to something like ElasticSearch which actually cares about real security at the node to node level. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883023#comment-13883023 ] Hoss Man commented on SOLR-5652: bq. Although that used to be true, it should no longer be the case: LUCENE-5178 Right, see also: SOLR-5165 SOLR-5222 On IRC, i drew sarowe's attention to these issues and DocValuesMissingTest and he pointed out that DocValuesMissingTest uses the following... bq. @SuppressCodecs({Lucene40, Lucene41, Lucene42}) // old formats cannot represent missing values ...so this may be the smoking gun to explain what's going wrong here, since we don't do anything like this in the cursor tests. (yet ... i'm going to fix that now) Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4072) CharFilter that Unicode-normalizes input
[ https://issues.apache.org/jira/browse/LUCENE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Goldfarb updated LUCENE-4072: --- Attachment: 4072.patch Attaching a new patch - testCuriousString still fails. You're right about readInputToBuffer. I think we also have to stop only on normalization boundaries. I see two options: use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward) or use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset(). {noformat} private int readInputToBuffer() throws IOException { final int len = input.read(tmpBuffer); if (len == -1) { inputFinished = true; return 0; } inputBuffer.append(tmpBuffer, 0, len); if (len = 2 normalizer.hasBoundaryAfter(tmpBuffer[len-1]) !Character.isHighSurrogate(tmpBuffer[len-1])) { return len; } else return len + readInputToBuffer(); } {noformat} CharFilter that Unicode-normalizes input Key: LUCENE-4072 URL: https://issues.apache.org/jira/browse/LUCENE-4072 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Ippei UKAI Attachments: 4072.patch, DebugCode.txt, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, ippeiukai-ICUNormalizer2CharFilter-4752cad.zip I'd like to contribute a CharFilter that Unicode-normalizes input with ICU4J. The benefit of having this process as CharFilter is that tokenizer can work on normalised text while offset-correction ensuring fast vector highlighter and other offset-dependent features do not break. The implementation is available at following repository: https://github.com/ippeiukai/ICUNormalizer2CharFilter Unfortunately this is my unpaid side-project and cannot spend much time to merge my work to Lucene to make appropriate patch. I'd appreciate it if anyone could give it a go. I'm happy to relicense it to whatever that meets your needs. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4072) CharFilter that Unicode-normalizes input
[ https://issues.apache.org/jira/browse/LUCENE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883032#comment-13883032 ] David Goldfarb edited comment on LUCENE-4072 at 1/27/14 6:10 PM: - Attaching a new patch - testCuriousString still fails. You're right about readInputToBuffer. I think we also have to stop only on normalization boundaries. I see two options: use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward) or use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset(). {noformat} private int readInputToBuffer() throws IOException { final int len = input.read(tmpBuffer); if (len == -1) { inputFinished = true; return 0; } inputBuffer.append(tmpBuffer, 0, len); if (len = 2 normalizer.hasBoundaryAfter(tmpBuffer[len-1]) !Character.isHighSurrogate(tmpBuffer[len-1])) { return len; } else return len + readInputToBuffer(); } {noformat} \[edit\] And the len = 2 clause wasn't meant to be part of the patch, ignore that. {noformat} if (normalizer.hasBoundaryAfter(tmpBuffer[len-1]) !Character.isHighSurrogate(tmpBuffer[len-1])) { return len; } else return len + readInputToBuffer(); {noformat} was (Author: dgoldfarb): Attaching a new patch - testCuriousString still fails. You're right about readInputToBuffer. I think we also have to stop only on normalization boundaries. I see two options: use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward) or use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset(). {noformat} private int readInputToBuffer() throws IOException { final int len = input.read(tmpBuffer); if (len == -1) { inputFinished = true; return 0; } inputBuffer.append(tmpBuffer, 0, len); if (len = 2 normalizer.hasBoundaryAfter(tmpBuffer[len-1]) !Character.isHighSurrogate(tmpBuffer[len-1])) { return len; } else return len + readInputToBuffer(); } {noformat} CharFilter that Unicode-normalizes input Key: LUCENE-4072 URL: https://issues.apache.org/jira/browse/LUCENE-4072 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Ippei UKAI Attachments: 4072.patch, DebugCode.txt, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, ippeiukai-ICUNormalizer2CharFilter-4752cad.zip I'd like to contribute a CharFilter that Unicode-normalizes input with ICU4J. The benefit of having this process as CharFilter is that tokenizer can work on normalised text while offset-correction ensuring fast vector highlighter and other offset-dependent features do not break. The implementation is available at following repository: https://github.com/ippeiukai/ICUNormalizer2CharFilter Unfortunately this is my unpaid side-project and cannot spend much time to merge my work to Lucene to make appropriate patch. I'd appreciate it if anyone could give it a go. I'm happy to relicense it to whatever that meets your needs. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883044#comment-13883044 ] Hoss Man commented on SOLR-5652: To clarify one thing: steve mentioned that it seems like there are two problems... bq. It looks to me like there are two problems here: 1) the same doc is showing up on different pages when deep paging; and 2) missing docvalue docs are sorted incorrectly. As far as #2 goes, now that we log every doc on every page, i can confirm that when i try some of these failed seeds (for example steves #129 log), i also see the incorrect ordering even though the test passes for me -- so #2 is almost certainly the codec issue. that still leaves the question about #1, and what it isn't completely reproducible -- but that may just be an artifact of #2 (ie: if these codecs have non-deterministic behavior when trying to access missing values, there could be arbitrary data in a reused bytebuffer) Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883048#comment-13883048 ] Hoss Man commented on SOLR-5652: bq. Also, the Solr Ref Guide says about docvalue fields... fixed. Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883053#comment-13883053 ] Shawn Heisey commented on SOLR-4470: bq. This product does not, and even worse, the core Dev team seems intent on NEVER doing so! I don't know that we *never* intend on adding security. We face a major problem with doing so at this time, though: We have absolutely no idea what servlet container the user is going to use for running the solr war. The example includes jetty, but aside from a few small edits in the stock config file, it is unmodified. Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config. Solr 5.0 will not ship as a .war file. The work hasn't yet been done that will turn it into an actual application, but it will be done before 5.0 gets released. Once Solr is a real application that owns and fully controls the HTTP layer, security will not be such a nightmare. You mention ElasticSearch and its ability to deal with security. ES is already a standalone application, which means they can do a lot of things that Solr currently can't. It's a legitimate complaint with Solr, one that we are trying to rectify. bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more I can't say anything about maven vs. ant. I don't have enough experience with either. Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883085#comment-13883085 ] David Webster commented on SOLR-4470: - Thanks, for the update, Shawn. The move to a stand-alone implementation should be a good one, with hope that a robust security implementation will be at the very top of the priority list. Not sure what the timeline for that is, but I've got a fairly short one for laying down the foundation of our Enterprise Search by 3rd Qtr. That will have to pass IA muster (mainstream Solr does not), which still leaves me in a bit of quandary as to how to proceed. I don't want the added TOC of maintaining our own search engine, but cannot wait around very long for viable solutions to surface, either. I'm either going to have to implement this patch branch, or move on to other engine choices... I know JBoss, JBPM specifically, used to be ant based but they've gone full Maven now. This is the first big Open Source project I've run across in some time that still uses Ant. Not many devs on our staff can still read a build.xml file anymore...and those that can would rather not... Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883114#comment-13883114 ] Hoss Man commented on SOLR-5463: bq. Some further thoughts: ... Yonik: no disagreement from me, but since what we've got so far has already been committed and backported to 4x, i think it would make sense to track your enhancement ideas in new issues for tracking purposes (unless you think you can help bang these out before 4.7). Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.7 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883117#comment-13883117 ] Steven Bower commented on SOLR-5488: I finally got a linux box at home to repro this issue (well at least a similar one).. I think the issue in how it identifies individual components of a query so that they are not duplicated throughout the query execution.. i think its just associating the wrong stats collectors with query components.. i've narrowed it down to that but not quite sure exactly where this is or why it is so ephemeral.. Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.7 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: Adds logParamsList parameter to support reduced l...
GitHub user cpoerschke opened a pull request: https://github.com/apache/lucene-solr/pull/23 Adds logParamsList parameter to support reduced logging. For https://issues.apache.org/jira/i#browse/SOLR-5672 add logParamsList parameter to support reduced logging. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr branch_4x-fewer-params-logged Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/23.patch commit e6f82c935d5f8ee6b225be41b5a6615833fc3029 Author: Christine Poerschke cpoersc...@bloomberg.net Date: 2014-01-24T13:17:44Z Adds logParamsList parameter to support reduced logging. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5672) add logParamsList parameter to support reduced logging
Christine Poerschke created SOLR-5672: - Summary: add logParamsList parameter to support reduced logging Key: SOLR-5672 URL: https://issues.apache.org/jira/browse/SOLR-5672 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke The use case we have is that logging full requests in each shard is just 'too much' but at the same time we wish to be able to tie together requests across shards. In certain circumstances we also wish to fully log some requests. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5672) add logParamsList parameter to support reduced logging
[ https://issues.apache.org/jira/browse/SOLR-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883146#comment-13883146 ] Christine Poerschke commented on SOLR-5672: --- The change https://github.com/apache/lucene-solr/pull/23 adds an new parameter. If it is missing then behaviour will be as it is now. If it is supplied the following use cases are possible: {code} ...logParamsList= # don't log any parameters ...logParamsList=q,fq # log only the q and fq parameters {code} add logParamsList parameter to support reduced logging -- Key: SOLR-5672 URL: https://issues.apache.org/jira/browse/SOLR-5672 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke The use case we have is that logging full requests in each shard is just 'too much' but at the same time we wish to be able to tie together requests across shards. In certain circumstances we also wish to fully log some requests. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883171#comment-13883171 ] Shalin Shekhar Mangar commented on SOLR-5473: - Some comments on the latest patch: # AbstractFullDistribZkTestBase has a useExternalCollection() which is hard coded to false. Why? Can we randomize using external collections in the base test to have better test coverage? # ClusterState.getCollections has a todo which says “fix later JUnit is failing”. Which test is failing? # What is _stateVer_ used for? I guess it is for SOLR-5474 and not this issue? # This patch has only whitespace related changes to CloudSolrServer. # There is wrong formatting and incorrect spacing in the new code such as Overseer.createCollection, new methods in ClusterState etc. You should re-format all the new/modified code blocks # There was one forbidden-api check failure where new String(byte[]) constructor is used in a log message. Run ant check-forbidden-apis from inside the solr directory. # There are three javadoc errors (run ant precommit): {code} [ecj-lint] 1. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java (at line 199) [ecj-lint] /** @deprecated [ecj-lint] ^^ [ecj-lint] Javadoc: Description expected after @deprecated [ecj-lint] -- [ecj-lint] 2. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java (at line 297) [ecj-lint] * @deprecated [ecj-lint]^^ [ecj-lint] Javadoc: Description expected after @deprecated [ecj-lint] -- [ecj-lint] -- [ecj-lint] 3. ERROR in /Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java (at line 759) [ecj-lint] * @param coll [ecj-lint] [ecj-lint] Javadoc: Description expected after this reference [ecj-lint] -- [ecj-lint] 3 problems (3 errors) {code} Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5672) add logParamsList parameter to support reduced logging
[ https://issues.apache.org/jira/browse/SOLR-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883169#comment-13883169 ] Mark Miller commented on SOLR-5672: --- +1 add logParamsList parameter to support reduced logging -- Key: SOLR-5672 URL: https://issues.apache.org/jira/browse/SOLR-5672 Project: Solr Issue Type: Improvement Reporter: Christine Poerschke The use case we have is that logging full requests in each shard is just 'too much' but at the same time we wish to be able to tie together requests across shards. In certain circumstances we also wish to fully log some requests. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Make one state.json per collection
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883195#comment-13883195 ] Mark Miller commented on SOLR-5473: --- I'm fairly busy in the short term - going out of town for a few days. But I intend to review this as well. Make one state.json per collection -- Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
REMINDER: Call For Papers: ApacheCon North America 2014 -- ends Feb 1st
(Note: cross posted, please keep any replies to general@lucene) Quick reminder that the CFP for ApacheCon (Denver) ends on Saturday... http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp Ladies and Gentlemen, start writing your proposals. The Call For Papers for ApacheCon North America 2014 is now open, and is open until February 1st, 2014. Note that we are on a very short timeline this year, so don't assume that we'll extend the CFP, just because we've done so every time before. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883288#comment-13883288 ] Shawn Heisey commented on SOLR-5670: Reducing heap requirements by not requiring data to go into the FieldCache is a major win for huge indexes. GC can be a major source of performance issues even if you've got garbage collection superbly tuned, and I doubt that my tuning parameters are perfect. _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Attachments: SOLR-5670.patch, SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883290#comment-13883290 ] Per Steffensen commented on SOLR-4470: -- bq. This product does not, and even worse, the core Dev team seems intent on NEVER doing so! At least most of them, yes. It is really a shame. bq. As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern As the tech lead on the largest REAL SolrCloud installation on the planet, I agree :-) I believe I can say that we have the largest installation in the world for two reasons * Upgrading from one version of SolrCloud to the next is not something that seem to be very important in this product. At least it is hard to do, and there seem to be no testing of it when a new release 4.y comes out - no testing that you can actually upgrade to it from 4.x. This makes me believe that no-one or at least only a few, have so big installations that just installing 4.y and store/index all data from the old 4.x installation from scratch is not an option. If others actually had to do upgrades where this is not possible, lots of complaints would pop up - and they dont * Our biggest system stores and indexes 1-2 billion documents per day, and have 2 years of history. That is about 1000 billion documents in Solr at any time with 1-2 billion going in every day (and 30-60 billion going out every month). To be able to run such a system we needed to do numerous optimizations, and in general without optimizations you will never get such a big system working. I do not see much talk around here about optimizations of that kind - probably because people have not run into the problems yet. bq. I like Solr. I like what it does and how it does it. Me too. On that part it actually has numerous advantages over e.g. ElasticSearch. We used ES to begin with, and we liked it, but for political reasons we where not allowed to keep using it, and we turned to find an alternative. At that point in time SolrCloud (4.x) where only in its startup phase (a year before 4.0 was released), but we believed so much in the idea behind, that we decided to go for it. bq. However, it's lack of internal security hooks is a complete show stopper for use at my firm For us, too. That is why we made our own fix to it - provided as a patch here and also available at https://github.com/steff1193/lucene-solr bq. Using this patch as our starting point I am happy to hear that. Please feel free to contact me if you have any problems making it work or understanding what it does. I might also be able to provide a few tips on making it extra secure :-) bq. and have our own Solr-like engine We made the same decision years ago. We have had our own version of Solr in our own VCS for years. Just recently I put the code on https://github.com/steff1193/lucene-solr. No releases (incl maven artifacts) yet. But that will come soon. Until then you will have to build it yourself from source. bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more Already done. {code} ant [-Dversion=$VERSION] get-maven-poms {code} Will build the maven structure in folder maven-build E.g. if you use Eclipse {code} ant eclipse {code} In Eclipse right-click the root-folder, chose Import... and Existing Maven Project. Import all Maven pom.xmls from maven-build folder bq. We have absolutely no idea what servlet container the user is going to use for running the solr war. It isnt important for this issue. Protecting the HTTP endpoints with authentication and authorization is standardized in the servlet-spec. All web-containers have to live up to that standard (to be certified). Only place where the standardization is not very clear is how to install a realm (the thingy knowing about user-credentials and roles), but all containers have plenty of documentation on how to do it. It is very important to understand that this issue, and the patch I provided will work for any web-container. This issue is not about enforcing the protection - let the web-container do that. This issue and the patch is ONLY about enabling Solr to send credentials in its Solr-node-to-Solr-node requests, so that things will keep working, if/when you make the obvious security decision and make usage of the security-features provided to you for free by the container. bq. Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config. NO! bq. Solr 5.0 will not ship as a .war file Bad idea. This is one of the points where Solr did a better decision that ES bq. Once Solr is a real application that owns and fully controls the HTTP layer,
[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883290#comment-13883290 ] Per Steffensen edited comment on SOLR-4470 at 1/27/14 9:09 PM: --- bq. This product does not, and even worse, the core Dev team seems intent on NEVER doing so! At least most of them, yes. It is really a shame. bq. As the lead Java architect for Distributed Systems Engineering at a fortune 100 company, security is my single most important concern As the tech lead on the largest REAL SolrCloud installation on the planet, I agree :-) I believe I can say that we have the largest installation in the world for two reasons * Upgrading from one version of SolrCloud to the next is not something that seem to be very important in this product. At least it is hard to do, and there seem to be no testing of it when a new release 4.y comes out - no testing that you can actually upgrade to it from 4.x. This makes me believe that no-one or at least only a few, have so big installations that just installing 4.y and store/index all data from the old 4.x installation from scratch is not an option. If others actually had to do upgrades where this is not possible, lots of complaints would pop up - and they dont * Our biggest system stores and indexes 1-2 billion documents per day, and have 2 years of history. That is about 1000 billion documents in Solr at any time with 1-2 billion going in every day (and 30-60 billion going out every month). To be able to run such a system we needed to do numerous optimizations, and in general without optimizations you will never get such a big system working. I do not see much talk around here about optimizations of that kind - probably because people have not run into the problems yet. bq. I like Solr. I like what it does and how it does it. Me too. On that part it actually has numerous advantages over e.g. ElasticSearch. We used ES to begin with, and we liked it, but for political reasons we where not allowed to keep using it, and we turned to find an alternative. At that point in time SolrCloud (4.x) where only in its startup phase (a year before 4.0 was released), but we believed so much in the idea behind, that we decided to go for it. bq. However, it's lack of internal security hooks is a complete show stopper for use at my firm For us, too. That is why we made our own fix to it - provided as a patch here and also available at https://github.com/steff1193/lucene-solr bq. Using this patch as our starting point I am happy to hear that. Please feel free to contact me if you have any problems making it work or understanding what it does. I might also be able to provide a few tips on making it extra secure :-) bq. and have our own Solr-like engine We made the same decision years ago. We have had our own version of Solr in our own VCS for years. Just recently I put the code on https://github.com/steff1193/lucene-solr. No releases (incl maven artifacts) yet. But that will come soon. Until then you will have to build it yourself from source. bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't opened a build.xml script in half a decade or more Already done. {code} ant [-Dversion=$VERSION] get-maven-poms {code} Will build the maven structure in folder maven-build E.g. if you use Eclipse {code} ant eclipse {code} In Eclipse right-click the root-folder, chose Import... and Existing Maven Project. Import all Maven pom.xmls from maven-build folder bq. We have absolutely no idea what servlet container the user is going to use for running the solr war. It isnt important for this issue. Protecting the HTTP endpoints with authentication and authorization is standardized in the servlet-spec. All web-containers have to live up to that standard (to be certified). Only place where the standardization is not very clear is how to install a realm (the thingy knowing about user-credentials and roles), but all containers have plenty of documentation on how to do it. It is very important to understand that this issue, and the patch I provided will work for any web-container. This issue is not about enforcing the protection - let the web-container do that. This issue and the patch is ONLY about enabling Solr to send credentials in its Solr-node-to-Solr-node requests, so that things will keep working, if/when you make the obvious security decision and make usage of the security-features provided to you for free by the container. bq. Solr has no control over the server-side HTTP layer right now, so anything we try to do will almost certainly be wrong as soon as the user changes containers or decides to modify their container config. NO! bq. Solr 5.0 will not ship as a .war file Bad idea. This is one of the points where Solr did a better decision that ES bq. Once Solr is a real application
[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5652: --- Attachment: SOLR-5652.codec.skip.dv.patch rather then just use SupressCodec in this test, here's a patch that checks to see if the codec supports docvalues with sort missing, and if not then it skips those fields -- but the other fields are still checked. you can see it working by comparing the logs messages (showing the fields tested) between things like... {noformat} ant test -Dtestcase=DistribCursorPagingTest -Dtests.codec=Lucene40 vs ant test -Dtestcase=DistribCursorPagingTest -Dtests.codec=Lucene45 {noformat} Before i commit this though, i really want to add an explicit sanity checking that the docs are in the expected order so we can see a definitive and consistent fail from the problem this tries to prevent ... i'm going to work on that this afternoon. (I also want to docValue fields to the test schema that don't use either sortMissingLast _or_ sortMissingFirst, and just rely on the default behavior ... not sure why i didn't think to include that in the first place) Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected
[ https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-5671: - Description: Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. All the failures were using either Lucene41 or Lucene42 codec was: Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected --- Key: SOLR-5671 URL: https://issues.apache.org/jira/browse/SOLR-5671 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Rowe Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small number of indexed docs and retrieved one fewer doc than the number of indexed docs. Both of these failures were on trunk on Windows: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/ I've also seen this twice on trunk on my OS X laptop (out of 875 trials). None of the seeds have reproduced for me. All the failures were using either Lucene41 or Lucene42 codec -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883317#comment-13883317 ] Robert Muir commented on SOLR-5652: --- {quote} On IRC, i drew sarowe's attention to these issues and DocValuesMissingTest and he pointed out that DocValuesMissingTest uses the following... @SuppressCodecs({Lucene40, Lucene41, Lucene42}) // old formats cannot represent missing values ...so this may be the smoking gun to explain what's going wrong here, since we don't do anything like this in the cursor tests. (yet ... i'm going to fix that now) {quote} Dammit, I feel pretty terrible. You guys have been debugging this thing for a long time, and I've been trying to stay up to date on the issue, but not once did I even think about this... Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-5670. Resolution: Fixed Fix Version/s: 4.7 5.0 Committed. Thanks! _version_ either indexed OR docvalue Key: SOLR-5670 URL: https://issues.apache.org/jira/browse/SOLR-5670 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7 Reporter: Per Steffensen Assignee: Per Steffensen Labels: solr, solrcloud, version Fix For: 5.0, 4.7 Attachments: SOLR-5670.patch, SOLR-5670.patch As far as I can see there is no good reason to require that _version_ field has to be indexed if it is docvalued. So I guess it will be ok with a rule saying _version_ has to be either indexed or docvalue (allowed to be both). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883402#comment-13883402 ] Steve Rowe commented on SOLR-5652: -- bq. rather then just use SupressCodec in this test, here's a patch that checks to see if the codec supports docvalues with sort missing, and if not then it skips those fields – but the other fields are still checked. +1, looks good, though on trunk Lucene3x and Appending can be removed from the blacklist in LTC.defaultCodecSupportsMissingDocValues(). I see these elsewhere on trunk (Solr tests only), though, so maybe they're not just vestiges? Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883426#comment-13883426 ] Mark Miller commented on SOLR-4470: --- The bulk of this patch was not that contentious. The rest seemed to mostly be hashed out. The missing piece has been a committer with the skill and time to put it in, take responsibility for it, and support it. Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883452#comment-13883452 ] Arcadius Ahouansou commented on LUCENE-5376: Hello. I have checked out this branch and did in the lucene directory an ant clean package-zip The build was successful and many artefacts were created including: - lucene-xml-query-demo.war - lucene-demo-5.0-SNAPSHOT.jar - lucene-server-5.0-SNAPSHOT.jar I dropped the war into a fresh jetty 9 install and jetty was not happy (see stacktrace below). My questions is: - How the demo and the new server package fit together? - How to run the demo? Thanks. {code} at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.eclipse.jetty.start.Main.invokeMain(Main.java:297) at org.eclipse.jetty.start.Main.start(Main.java:724) at org.eclipse.jetty.start.Main.main(Main.java:103) 2014-01-27 22:21:36.288:WARN:lucene-xml-query-demo:main: unavailable javax.servlet.UnavailableException: org.apache.lucene.xmlparser.webdemo.FormBasedXmlQueryDemo at org.eclipse.jetty.servlet.BaseHolder.doStart(BaseHolder.java:102) at org.eclipse.jetty.servlet.ServletHolder.doStart(ServletHolder.java:294) {code} Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
maven build issues with non-numeric custom version
From: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/dev-tools/maven/README.maven It says we can get a custom build number using: ant -Dversion=my-special-version get-maven-poms but this fails with: BUILD FAILED /Users/ryan/workspace/apache/lucene_4x/build.xml:141: The following error occurred while executing this line: /Users/ryan/workspace/apache/lucene_4x/lucene/common-build.xml:1578: The following error occurred while executing this line: /Users/ryan/workspace/apache/lucene_4x/lucene/tools/custom-tasks.xml:122: Malformed module dependency from 'lucene-analyzers-phonetic.internal.test.dependencies': 'lucene/build/analysis/common/lucene-analyzers-common-my-special-version.jar' Using a numeric version number things work OK. Any ideas? ryan
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883459#comment-13883459 ] Jan Høydahl commented on SOLR-4470: --- I started the port to trunk along with some other changes last summer, but did not get to finalize it within the time available at that time. I also realized I need some help moving along as I'm quite novice on servlet security. Implementing this patch for 5.0 and 4.x would still be worth the effort, should we choose to replace the container with Netty or something else, since most of the internal inter-node communication will stay the same - is that correct? When I dived into this last time around the intent was to commit a working impl to trunk first, let it bake for a few weeks (perhaps with the test framework randomizin security on/off) and then backport. This is best practice for big changes, and this patch is HUGE. So here is one committer willing to contribute, but I need some help from someone willing to take a look at https://github.com/cominvent/lucene-solr/tree/SOLR-4470 and finding out out what 1% is missing for it to work, and then get it up to date with current trunk... Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: Lucene 5092 pull 1
GitHub user PaulElschot opened a pull request: https://github.com/apache/lucene-solr/pull/24 Lucene 5092 pull 1 DocBlocksIterator extends DocIdSetIterator. FixedBitSetDBI and EliasFanoDocIdSet implement DocBlocksIterator. The join module ToParent/ToChild queries use DocBlocksIterator instead of FixedBitSet. You can merge this pull request into a Git repository by running: $ git pull https://github.com/PaulElschot/lucene-solr LUCENE-5092-pull-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/24.patch commit 0b4c85b1b30426f34f65a03c32bb2618e1d03f99 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-19T19:31:14Z Ignore *.*~ and *.jar files commit 9a3c80013219b986340cd5a470fb30d20d35504a Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-19T20:35:54Z Add first version of DocBlockIterator commit 77341eed771facde8cf89bc85c99fe0ccd6bd257 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-19T20:53:00Z OpenBitSetIterator extends DocBlockIterator, advanceToJustBefore() not yet implemented. commit d920b8e6f2fbf39da42a5eff19301c4ca92647c6 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-19T21:46:48Z Initial implementation of OpenBitSetIterator.advanceToJustBefore() commit ebff7763d31518989882909da56e0b9be22a4f89 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-19T21:57:38Z The OpenBitSetIterator constructor not using an OpenBitSet can not easily be deleted commit 4166b0e4fa44b10f7c25158a811ff8593d540957 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-19T22:16:30Z More detailed plan commit 807f98db323ee78454d6bb7d76a9d40d89e8126b Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-20T19:11:17Z Rename to DocBlocksIterator commit 7ea28b0443e62d4e02458943a06cd97a9c8ad843 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-20T19:17:09Z Rename to class DocBlocksIterator commit 42e4bbc18769f7f91a6dfd730cc5d7d51582cb6c Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-20T19:52:21Z Adapted ToParentBlockJoinQuery to use DocBlocksIterator directly from FBS, tests pass commit 3d7819bc9e3b8754e6f882e60a0920800ba09954 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-20T21:19:53Z Remove some commented code commit 4b2a7a4a529810dbf742958463c3f9327444f3b1 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-20T22:26:27Z Getting closer with ToChildBJQ commit 24032392ede9b8b2997152f4f6aec3af03a6e550 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-21T15:16:21Z Merge branch 'trunk' into docblocksiter commit 8fde265979ba8913045a3f9cd87a15482739cc43 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-21T16:49:48Z Always set OpenBitSet attribute in OpenBitSetIterator commit b7627dd4f41aff421af6d9a0781fcc13fe668995 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-21T16:51:06Z Added a test for advanceToJustBefore in BaseDocIdSetTestCase, TestFixedBitSet fails commit f1966ae5b4f375c7451ff083288e409a0b41b9ef Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-21T21:14:11Z Previous test seed passes, next one fails commit c198cd8b6b06187c65477f088dad918974721099 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-22T23:49:52Z Added OpenBitSetDocBlocksIterator commit c29094ceba3bec8773e51c17fe3c80abab5ae526 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-22T23:53:00Z Merge branch 'trunk' of https://github.com/apache/lucene-solr into docblocksiter commit 7f7d8901bb396b82a0e874ca1f3c4264806fcd8e Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T20:37:49Z Improve ignoring lib directories commit e8abc6f30060ac10de886b6fcc225d561e4758b5 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T21:20:13Z Added FixedBitSetDBI, tests pass. FixedBitSet.java from trunk, made some private things protected. commit f78dca9bdf2b79fe3fbb7b80898fb88420891418 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T21:30:05Z Remove some unused imports commit 273a7e80767252f9748878878b0e9d742d2df669 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T21:33:17Z Remove commented println lines commit 3f93aa8d76422844d141fc2070a236e780e577f8 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T23:24:09Z Add TestDocIdSetBenchMark.java. Note: no APL 2.0 commit 3ca778ffee79cc9bd549e4b0dd37e00f16ba6320 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T23:26:06Z Add assert message commit 50f0175fda3637b88e982f285021921c69fe4dff Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T23:26:22Z Correct comment commit d07201d00dada7d3c4bde33471dac3accdb9b1e8 Author: Paul Elschot paul.j.elsc...@gmail.com Date: 2014-01-23T23:26:52Z Remove
[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances
[ https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883518#comment-13883518 ] Paul Elschot commented on LUCENE-5092: -- I have opened this pull request: https://github.com/apache/lucene-solr/pull/24 In case a patch is preferred, please let me know. In the pull request: DocBlocksIterator extends DocIdSetIterator. FixedBitSetDBI and EliasFanoDocIdSet implement DocBlocksIterator, so EliasFanoDocIdSet could also be used for joins. The join module ToParent/ToChild queries use DocBlocksIterator instead of FixedBitSet. In the join module, FixedBitSetCachingWrapperFilter.java is replaced by DocBlocksCachingWrapperFilter which uses FixedBitSetDBI for now. LUCENE-5416 is open for FixedBitSetDBI. join: don't expect all filters to be FixedBitSet instances -- Key: LUCENE-5092 URL: https://issues.apache.org/jira/browse/LUCENE-5092 Project: Lucene - Core Issue Type: Improvement Components: modules/join Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5092.patch The join module throws exceptions when the parents filter isn't a FixedBitSet. The reason is that the join module relies on prevSetBit to find the first child document given a parent ID. As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by exposing methods in the iterators to iterate backwards. When the join modules gets an iterator which isn't able to iterate backwards, it would just need to dump its content into another DocIdSet that supports backward iteration, FixedBitSet for example. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883528#comment-13883528 ] Vassil Velichkov commented on SOLR-2242: I really hope that this issue will be resolved in SOLR 4.7...Fingers crossed :-) Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0-ALPHA Reporter: Bill Bell Priority: Minor Fix For: 4.7 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: Parameters: facet.numTerms or f.field.facet.numTerms = true (default is false) - turn on distinct counting of terms facet.field - the field to count the terms It creates a new section in the facet section... http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=falsefacet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_counts lst name=facet_queries/ lst name=facet_fields.../lst lst name=facet_numTerms lst name=localhost:8983/solr/ int name=price14/int /lst lst name=localhost:8080/solr/ int name=price14/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst OR with no sharding- lst name=facet_numTerms int name=price14/int /lst {code} Several people use this to get the group.field count (the # of groups). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5652: --- Attachment: SOLR-5652.nocommit.patch Ok, this new patch has the following... * new {{\*_dv}} fields in the schema for all the various types w/o using any of the sort missing options * tweaked the simple testing in both the single node and distrib test so that: ** one doc is missing an int value ** we randomly pick either int or int_dv as a field to use in explicit sorts *** currently a nocommit in place to force this to be int_dv ** we explicitly sort on all 3 missing sub-variants (, _first, _last) and check the doc order exactly matches our expectations * includes everything from SOLR-5652.codec.skip.dv.patch... ** ...but there is a nocommit bypassing hte codec check so docvalues are always used. With this patch, and these nocommits, it's pretty trivial to reliably reproduce failing seeds that pop up when running... {code} ant test -Dtests.class=\*Cursor\* -Dtests.codec=Lucene40 {code} ...and likewise, my limted testing so far hasn't seen any failures when running this patch with Lucene45 codec... {code} ant test -Dtests.class=\*Cursor\* -Dtests.codec=Lucene45 {code} Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, SOLR-5652.nocommit.patch, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...
[ https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5652: --- Attachment: SOLR-5652.patch patch i think is commitable - same as SOLR-5652.nocommit.patch but with the nocommits removed, and the (in hindsight) obvious change needed to my new intsort field ranomization so that when the codec's docvalues support can't handle missing values, we use the non-docvalues version of that field for the explicit checks of \*_last and \*_first sorting I'm currently bash loop hammering on this patch -- would appreciate it if others could try the same. Heisenbug in DistribCursorPagingTest: walk already seen ... - Key: SOLR-5652 URL: https://issues.apache.org/jira/browse/SOLR-5652 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, SOLR-5652.nocommit.patch, SOLR-5652.patch, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt Several times now, Uwe's jenkins has encountered a walk already seen ... assertion failure from DistribCursorPagingTest that I've been unable to fathom, let alone reproduce (although sarowe was able to trigger a similar, non-reproducible seed, failure on his machine) Using this as a tracking issue to try and make sense of it. Summary of things noticed so far: * So far only seen on http://jenkins.thetaphi.de sarowe's mac * So far seen on MacOSX and Linux * So far seen on branch 4x and trunk * So far seen on Java6, Java7, and Java8 * fails occured in first block of randomized testing: ** we've indexed a small number of randomized docs ** we're explicitly looping over every field and sorting in both directions * fails were sorting on one of the \*_dv_last or \*_dv_first fields (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) ** for desc sorts, sort on same field asc has worked fine just before this (fields are in arbitrary order, but asc always tried before desc) ** sorting on some other random fields has sometimes been tried before this and worked (specifics of each failure seen in the wild recorded in comments) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883656#comment-13883656 ] David Webster commented on SOLR-4470: - Again, appreciate the input, looks like the issue is at least alive. We are meeting Friday on this issue to plot our strategy. I am getting familiar with the specifics of the issue, and am coming to realize the type of HTTP container is largely irrelevant, so long as it is spec-compliant servlet container (as Tomcat and Jetty are). I do not particularly agree with the need for a container, however. We are gradually moving away from pre-packaged containers ourselves, instead moving towards framework tools like Spring Web and Grizzly2. We write all our own JAAS LoginModules today and have a deep bench when it comes to managing service side security, be those servlet (RESTful/HTTP), JMS, or anything else. There are pluses and minuses in whether or not to use standard containers or roll your own Servlet implementation. Another discussion for another day We have had the same issue present in Solr in our RESTful service implementations in making them secure. We have a maturing RESTful/HTTP security standard, and that requires our REST client code to do very specific things when making down stream requests to secure services that expect a very specific secured request. For instance, I can add a valve to Tomcat to have it check for a user's SiteMinder cookie and then validate it with a call to a Policy server. I could also implement a secret key (kerberos type thing). I can implement that capability on the service side via a JAAS LoginModule, and Tomcat Valve configuration without digging into Tomcat core code. But on the client side I have to write actual core code to place the SiteMinder token/Secret key encryption, etc.. in a cookie or header, etc, and send it downstream. I imagine the same must be true in the SolrCloud. I can lock down the receiver side via configuration and standard Container plugins, but it's the sender side that we can do nothing about without some core code modification that would allow us to send whatever security artifacts downstream we deem appropriate. My main fear is performance within the cloud during the sharding processes. Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.7 Attachments: SOLR-4470.patch, SOLR-4470.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal credentials for the asynchronous/non-rooted requests. In our solution we would aim at only supporting basic http auth, but we would like to make a framework around it, so that not to much refactoring is needed if you later want to make support for other kinds of auth (e.g. digest) We will work at a solution but create this JIRA issue early in order to get input/comments from the community as early as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5409) ToParentBlockJoinCollector.getTopGroups returns empty Groups
[ https://issues.apache.org/jira/browse/LUCENE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Cheng updated LUCENE-5409: --- Attachment: local_history.patch patch file ToParentBlockJoinCollector.getTopGroups returns empty Groups Key: LUCENE-5409 URL: https://issues.apache.org/jira/browse/LUCENE-5409 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.6 Environment: Ubuntu 12.04 Reporter: Peng Cheng Assignee: Michael McCandless Priority: Critical Fix For: 4.7 Attachments: local_history.patch Original Estimate: 168h Remaining Estimate: 168h A bug is observed to cause unstable results returned by the getTopGroups function of class ToParentBlockJoinCollector. In the scorer generation stage, the ToParentBlockJoinCollector will automatically rewrite all the associated ToParentBlockJoinQuery (and their subqueries), and save them into its in-memory Look-up table, namely joinQueryID (see enroll() method for detail). Unfortunately, in the getTopGroups method, the new ToParentBlockJoinQuery parameter is not rewritten (at least users are not expected to do so). When the new one is searched in the old lookup table (considering the impact of rewrite() on hashCode()), the lookup will largely fail and eventually end up with a topGroup collection consisting of only empty groups (their hitCounts are guaranteed to be zero). An easy fix would be to rewrite the original BlockJoinQuery before invoking getTopGroups method. However, the computational cost of this is not optimal. A better but slightly more complex solution would be to save unrewrited Queries into the lookup table. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5409) ToParentBlockJoinCollector.getTopGroups returns empty Groups
[ https://issues.apache.org/jira/browse/LUCENE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883693#comment-13883693 ] Peng Cheng commented on LUCENE-5409: Finally got your test case: it only appears in larger scale, this is really excruciating as I'm not a software architect. To run the failed test case, please apply the attached patch or manually copy the unit test function into testBlockJoin.java ToParentBlockJoinCollector.getTopGroups returns empty Groups Key: LUCENE-5409 URL: https://issues.apache.org/jira/browse/LUCENE-5409 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.6 Environment: Ubuntu 12.04 Reporter: Peng Cheng Assignee: Michael McCandless Priority: Critical Fix For: 4.7 Attachments: local_history.patch Original Estimate: 168h Remaining Estimate: 168h A bug is observed to cause unstable results returned by the getTopGroups function of class ToParentBlockJoinCollector. In the scorer generation stage, the ToParentBlockJoinCollector will automatically rewrite all the associated ToParentBlockJoinQuery (and their subqueries), and save them into its in-memory Look-up table, namely joinQueryID (see enroll() method for detail). Unfortunately, in the getTopGroups method, the new ToParentBlockJoinQuery parameter is not rewritten (at least users are not expected to do so). When the new one is searched in the old lookup table (considering the impact of rewrite() on hashCode()), the lookup will largely fail and eventually end up with a topGroup collection consisting of only empty groups (their hitCounts are guaranteed to be zero). An easy fix would be to rewrite the original BlockJoinQuery before invoking getTopGroups method. However, the computational cost of this is not optimal. A better but slightly more complex solution would be to save unrewrited Queries into the lookup table. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_51) - Build # 9162 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9162/ Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:49255 within 3 ms Stack Trace: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:49255 within 3 ms at __randomizedtesting.SeedInfo.seed([563CD1B6D5724F09:D7DA5FAEA22D2F35]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:147) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:98) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:93) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:84) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:198) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at