[jira] [Created] (SOLR-6100) BlendedInfixSuggester and AnalyzingInfixSuggester are never closed on core shutdown (unremovable files on Windows)
Dawid Weiss created SOLR-6100: - Summary: BlendedInfixSuggester and AnalyzingInfixSuggester are never closed on core shutdown (unremovable files on Windows) Key: SOLR-6100 URL: https://issues.apache.org/jira/browse/SOLR-6100 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 In essense these classes are Closeable but neither SolrSuggester nor Suggester close them at the core shutdown time. I'm also not sure what the difference is between SolrSuggester and Suggester and whether both or them are needed. They seem awfully similar... I've fixed the problem with the attached patch on LUCENE-5650, but I'd appreciate if somebody with a deeper knowledge of Solr could chime in and confirm the patch is all right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004436#comment-14004436 ] ASF subversion and git services commented on LUCENE-5650: - Commit 1596497 from [~dawidweiss] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1596497 ] SOLR-6100, LUCENE-5650: fix an uncloseable file leak in solr suggesters. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6100) BlendedInfixSuggester and AnalyzingInfixSuggester are never closed on core shutdown (unremovable files on Windows)
[ https://issues.apache.org/jira/browse/SOLR-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004435#comment-14004435 ] ASF subversion and git services commented on SOLR-6100: --- Commit 1596497 from [~dawidweiss] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1596497 ] SOLR-6100, LUCENE-5650: fix an uncloseable file leak in solr suggesters. BlendedInfixSuggester and AnalyzingInfixSuggester are never closed on core shutdown (unremovable files on Windows) -- Key: SOLR-6100 URL: https://issues.apache.org/jira/browse/SOLR-6100 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-6100.patch In essense these classes are Closeable but neither SolrSuggester nor Suggester close them at the core shutdown time. I'm also not sure what the difference is between SolrSuggester and Suggester and whether both or them are needed. They seem awfully similar... I've fixed the problem with the attached patch on LUCENE-5650, but I'd appreciate if somebody with a deeper knowledge of Solr could chime in and confirm the patch is all right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6100) BlendedInfixSuggester and AnalyzingInfixSuggester are never closed on core shutdown (unremovable files on Windows)
[ https://issues.apache.org/jira/browse/SOLR-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-6100: -- Attachment: SOLR-6100.patch BlendedInfixSuggester and AnalyzingInfixSuggester are never closed on core shutdown (unremovable files on Windows) -- Key: SOLR-6100 URL: https://issues.apache.org/jira/browse/SOLR-6100 Project: Solr Issue Type: Bug Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-6100.patch In essense these classes are Closeable but neither SolrSuggester nor Suggester close them at the core shutdown time. I'm also not sure what the difference is between SolrSuggester and Suggester and whether both or them are needed. They seem awfully similar... I've fixed the problem with the attached patch on LUCENE-5650, but I'd appreciate if somebody with a deeper knowledge of Solr could chime in and confirm the patch is all right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6098) SOLR console displaying JSON does not escape text properly
[ https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-6098: Affects Version/s: 4.4 SOLR console displaying JSON does not escape text properly -- Key: SOLR-6098 URL: https://issues.apache.org/jira/browse/SOLR-6098 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.4 Reporter: Kingston Duffie Priority: Minor Fix For: 4.5 In the SOLR admin web console, when displaying JSON response for Query, the text is not being HTML escaped, so any text that happens to match HTML markup is being processed as HTML. For example, enter strikehello/strike in the q textbox and the responseHeader will contain: q: body:hello where the hello portion is shown using strikeout. This seems benign, but can be extremely confusing when viewing results, because if your fields happen to contain, for example, f...@bar.com, this will be completely missing (because the browser treats this as an invalid tag). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6098) SOLR console displaying JSON does not escape text properly
[ https://issues.apache.org/jira/browse/SOLR-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-6098. - Resolution: Duplicate Fix Version/s: 4.5 Assignee: Stefan Matheis (steffkes) SOLR console displaying JSON does not escape text properly -- Key: SOLR-6098 URL: https://issues.apache.org/jira/browse/SOLR-6098 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.4 Reporter: Kingston Duffie Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.5 In the SOLR admin web console, when displaying JSON response for Query, the text is not being HTML escaped, so any text that happens to match HTML markup is being processed as HTML. For example, enter strikehello/strike in the q textbox and the responseHeader will contain: q: body:hello where the hello portion is shown using strikeout. This seems benign, but can be extremely confusing when viewing results, because if your fields happen to contain, for example, f...@bar.com, this will be completely missing (because the browser treats this as an invalid tag). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5309) Investigate ShardSplitTest failures
[ https://issues.apache.org/jira/browse/SOLR-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004472#comment-14004472 ] Shalin Shekhar Mangar commented on SOLR-5309: - I am looking at these failure again today. Yeah, it's been that busy around here :( I implemented a RateLimitedDirectoryFactory for Solr with a very small limit and forced ShardSplitTest to use it always. This helped reproduce the issue for me. I have finally managed to track down the root cause. It always perplexed me that the difference between expected and actual doc counts was almost always 1. Whenever we add/delete documents during shard splitting, we synchronously forward the request to the appropriate sub-shard. For add requests, a single sub-shard is selected but for delete by ids, we weren't selecting a single sub-shard. Instead we are forwarding the delete by id to all sub-shards. This works out fine and doesn't cause any damage in practice because the id exists only on one shard. However, when one sub-shard (the right one) accepts the delete and the other rejects it (maybe because it became active in the mean-time) then the client (ShardSplitTest) gets an error back and assumes that the delete did not succeed whereas it actually succeeded on the right sub-shard. We always advise our users to retry update operations upon failure and they would be fine if they follow this advise during shard splitting also. ShardSplitTest unfortunately doesn't follow that advice and just counts success/failures and ends up with an inconsistent state. I'll start by fixing delete-by-id to route requests to the correct (single) sub-shard and enabling this test again. Investigate ShardSplitTest failures --- Key: SOLR-5309 URL: https://issues.apache.org/jira/browse/SOLR-5309 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Blocker Investigate why ShardSplitTest if failing sporadically. Some recent failures: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3328/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7760/ http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/861/ -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5309) Investigate ShardSplitTest failures
[ https://issues.apache.org/jira/browse/SOLR-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004490#comment-14004490 ] ASF subversion and git services commented on SOLR-5309: --- Commit 1596510 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1596510 ] SOLR-5309: Fix DUP.processDelete to route delete-by-id to one sub-shard only. Enable ShardSplitTest again. Investigate ShardSplitTest failures --- Key: SOLR-5309 URL: https://issues.apache.org/jira/browse/SOLR-5309 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Blocker Investigate why ShardSplitTest if failing sporadically. Some recent failures: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3328/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7760/ http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/861/ -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6101) Shard splitting doesn't work in legacyCloud=false mode
Shalin Shekhar Mangar created SOLR-6101: --- Summary: Shard splitting doesn't work in legacyCloud=false mode Key: SOLR-6101 URL: https://issues.apache.org/jira/browse/SOLR-6101 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 When we invoke splitshard Collection API against a cluster with legacyCloud=false, we get the following errors: {code} 2014-05-15 21:07:58,986 [Overseer-163819091268403216-ec2-x.compute-1.amazonaws.com:8986_solr-n_51] ERROR solr.cloud.OverseerCollectionProcessor - Collection splitshard of splitshard failed:org.apache.solr.common.SolrException: Could not find coreNodeName at org.apache.solr.cloud.OverseerCollectionProcessor.waitForCoreNodeName(OverseerCollectionProcessor.java:1504) at org.apache.solr.cloud.OverseerCollectionProcessor.splitShard(OverseerCollectionProcessor.java:1255) at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:472) at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:248) at java.lang.Thread.run(Thread.java:745) 2014-05-15 21:07:59,003 [Overseer-163819091268403216-ec2-xxx.compute-1.amazonaws.com:8986_solr-n_51] INFO solr.cloud.OverseerCollectionProcessor - Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-18 complete, response:{success={null={responseHeader={status=0,QTime=1}},null={responseHeader={status=0,QTime=1}}},split117278106116750={responseHeader={status=0,QTime=0},STATUS=failed,Response=Error CREATEing SolrCore '3M_shard1_1_replica1': non legacy mode coreNodeName missing shard=shard1_1name=3M_shard1_1_replica1action=CREATEcollection=3Mwt=javabinqt=/admin/coresasync=split117278106116750version=2},Operation splitshard caused exception:=org.apache.solr.common.SolrException: Could not find coreNodeName,exception={msg=Could not find coreNodeName,rspCode=500}} {code} The sub-shard replica (leader) creation fails due to: {code} { responseHeader: { status: 0, QTime: 0 }, STATUS: failed, Response: Error CREATEing SolrCore '3M_shard1_0_replica1': non legacy mode coreNodeName missing shard=shard1_0name=3M_shard1_0_replica1action=CREATEcollection=3Mwt=javabinqt=/admin/coresasync=split117278099904930version=2 } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004504#comment-14004504 ] ASF subversion and git services commented on LUCENE-5675: - Commit 1596512 from [~mikemccand] in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596512 ] LUCENE-5675: fix nocommits ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Reporter: Robert Muir Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004515#comment-14004515 ] Dawid Weiss commented on LUCENE-5650: - All tests passed for me with the current state of the branch (including nightlies). createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004557#comment-14004557 ] Shai Erera commented on LUCENE-5689: ReaderAndUpdates already clones all FIs and then updates the dvGen of the ones that are updated now. So cloning again is silly ... perhaps we can get rid of it some day, but I agree, let's remove the public first. And yes, if you modify the dvGen on an AtomicReader, you might hit weird exceptions like FNFE when the reader will try to lookup the field's dv-gen'd file. FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
[ https://issues.apache.org/jira/browse/LUCENE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004560#comment-14004560 ] Shai Erera commented on LUCENE-5679: I'm not sure how critical it is Uwe. Yes, it means users need to recompile their app's code, but this is minor? It's not like they need to change the code, only recompile it. I am still waiting for someone to say that he upgrades his search app to a newer Lucene version by simply dropping the new jar 4.9 already includes changes to runtime behavior and some back-compat changes. Consolidate IndexWriter.deleteDocuments() - Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.9, 5.0 Attachments: LUCENE-5679.patch Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5688) NumericDocValues fields with sparse data can be compressed better
[ https://issues.apache.org/jira/browse/LUCENE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004562#comment-14004562 ] Adrien Grand commented on LUCENE-5688: -- +1 to using binary search on an in-memory {{MonotonicBlockPackedReader}} to implement sparse doc values. NumericDocValues fields with sparse data can be compressed better -- Key: LUCENE-5688 URL: https://issues.apache.org/jira/browse/LUCENE-5688 Project: Lucene - Core Issue Type: Improvement Reporter: Varun Thacker Priority: Minor Attachments: LUCENE-5688.patch I ran into this problem where I had a dynamic field in Solr and indexed data into lots of fields. For each field only a few documents had actual values and the remaining documents the default value ( 0 ) got indexed. Now when I merge segments, the index size jumps up. For example I have 10 segments - Each with 1 DV field. When I merge segments into 1 that segment will contain all 10 DV fields with lots if 0s. This was the motivation behind trying to come up with a compression for a use case like this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6102) The 'addreplica' Collection API does not support property params
Shalin Shekhar Mangar created SOLR-6102: --- Summary: The 'addreplica' Collection API does not support property params Key: SOLR-6102 URL: https://issues.apache.org/jira/browse/SOLR-6102 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8.1, 4.8 Reporter: Shalin Shekhar Mangar Fix For: 4.9, 5.0 All Collection APIs except 'addreplica', support passing core properties in the property.XXX format. Such property params are passed directly the core admin APIs invoked by these collection APIs. Not supporting these params is a bug and we should fix it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004614#comment-14004614 ] ASF subversion and git services commented on LUCENE-5689: - Commit 1596553 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1596553 ] LUCENE-5689: FieldInfo.setDocValuesGen should not be public FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004617#comment-14004617 ] ASF subversion and git services commented on LUCENE-5689: - Commit 1596555 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596555 ] LUCENE-5689: FieldInfo.setDocValuesGen should not be public FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5689) FieldInfo.setDocValuesGen should not be public.
[ https://issues.apache.org/jira/browse/LUCENE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5689. - Resolution: Fixed Fix Version/s: 5.0 4.9 FieldInfo.setDocValuesGen should not be public. --- Key: LUCENE-5689 URL: https://issues.apache.org/jira/browse/LUCENE-5689 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5689.patch its currently public and users can modify it. We made this class mostly immutable long ago: remember its returned by the atomicreader API! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files
[ https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004644#comment-14004644 ] ASF subversion and git services commented on LUCENE-5636: - Commit 1596570 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1596570 ] LUCENE-5618, LUCENE-5636: write each DocValues update in a separate file; stop referencing old fieldInfos files SegmentCommitInfo continues to list unneeded gen'd files Key: LUCENE-5636 URL: https://issues.apache.org/jira/browse/LUCENE-5636 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5636.patch I thought I handled it in LUCENE-5246, but turns out I didn't handle it fully. I'll upload a patch which improves the test to expose the bug. I know where it is, but I'm not sure how to fix it without breaking index back-compat. Can we do that on experimental features? The problem is that if you update different fields in different gens, the FieldInfos files of older gens remain referenced (still!!). I open a new issue since LUCENE-5246 is already resolved and released, so don't want to mess up our JIRA... The severity of the bug is that unneeded files are still referenced in the index. Everything still works correctly, it's just that .fnm files are still there. But as I wrote, I'm still not sure how to solve it without requiring apps that use dv updates to reindex. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers
[ https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004643#comment-14004643 ] ASF subversion and git services commented on LUCENE-5618: - Commit 1596570 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1596570 ] LUCENE-5618, LUCENE-5636: write each DocValues update in a separate file; stop referencing old fieldInfos files DocValues updates send wrong fieldinfos to codec producers -- Key: LUCENE-5618 URL: https://issues.apache.org/jira/browse/LUCENE-5618 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Shai Erera Priority: Blocker Fix For: 4.9 Attachments: LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch Spinoff from LUCENE-5616. See the example there, docvalues readers get a fieldinfos, but it doesn't contain the correct ones, so they have invalid field numbers at read time. This should really be fixed. Maybe a simple solution is to not write batches of fields in updates but just have only one field per gen? This removes many-many relationships and would make things easy to understand. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004671#comment-14004671 ] Noble Paul commented on SOLR-6091: -- [~mewmewball] I implemented this and I see the race condition happening in my cluster Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6103) Add DateRangeField
David Smiley created SOLR-6103: -- Summary: Add DateRangeField Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6103) Add DateRangeField
[ https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004709#comment-14004709 ] David Smiley commented on SOLR-6103: It just occurred to me that {noformat}* TO 2014{noformat} ought to be supported but it doesn't work -- I'll fix that in LUCENE-5648. Perhaps the range syntax should include matching '[' and ']'? It's only pertinent for indexing ranges; at query time you might as well use the normal range query syntax. One aspect I haven't considered is exclusive boundaries, but I think it's generally a non-issue because of the rounding this field supports. Note that LUCENE-5648 is still only v5/trunk for the moment. Add DateRangeField -- Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers
[ https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004714#comment-14004714 ] ASF subversion and git services commented on LUCENE-5618: - Commit 1596582 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596582 ] LUCENE-5618, LUCENE-5636: write each DocValues update in a separate file; stop referencing old fieldInfos files DocValues updates send wrong fieldinfos to codec producers -- Key: LUCENE-5618 URL: https://issues.apache.org/jira/browse/LUCENE-5618 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Shai Erera Priority: Blocker Fix For: 4.9 Attachments: LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch Spinoff from LUCENE-5616. See the example there, docvalues readers get a fieldinfos, but it doesn't contain the correct ones, so they have invalid field numbers at read time. This should really be fixed. Maybe a simple solution is to not write batches of fields in updates but just have only one field per gen? This removes many-many relationships and would make things easy to understand. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files
[ https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004715#comment-14004715 ] ASF subversion and git services commented on LUCENE-5636: - Commit 1596582 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596582 ] LUCENE-5618, LUCENE-5636: write each DocValues update in a separate file; stop referencing old fieldInfos files SegmentCommitInfo continues to list unneeded gen'd files Key: LUCENE-5636 URL: https://issues.apache.org/jira/browse/LUCENE-5636 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5636.patch I thought I handled it in LUCENE-5246, but turns out I didn't handle it fully. I'll upload a patch which improves the test to expose the bug. I know where it is, but I'm not sure how to fix it without breaking index back-compat. Can we do that on experimental features? The problem is that if you update different fields in different gens, the FieldInfos files of older gens remain referenced (still!!). I open a new issue since LUCENE-5246 is already resolved and released, so don't want to mess up our JIRA... The severity of the bug is that unneeded files are still referenced in the index. Everything still works correctly, it's just that .fnm files are still there. But as I wrote, I'm still not sure how to solve it without requiring apps that use dv updates to reindex. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers
[ https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5618. Resolution: Fixed Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x. DocValues updates send wrong fieldinfos to codec producers -- Key: LUCENE-5618 URL: https://issues.apache.org/jira/browse/LUCENE-5618 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Shai Erera Priority: Blocker Fix For: 4.9 Attachments: LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch, LUCENE-5618.patch Spinoff from LUCENE-5616. See the example there, docvalues readers get a fieldinfos, but it doesn't contain the correct ones, so they have invalid field numbers at read time. This should really be fixed. Maybe a simple solution is to not write batches of fields in updates but just have only one field per gen? This removes many-many relationships and would make things easy to understand. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5636) SegmentCommitInfo continues to list unneeded gen'd files
[ https://issues.apache.org/jira/browse/LUCENE-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5636. Resolution: Fixed Fix Version/s: 5.0 4.9 Fixed in LUCENE-5618 SegmentCommitInfo continues to list unneeded gen'd files Key: LUCENE-5636 URL: https://issues.apache.org/jira/browse/LUCENE-5636 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.9, 5.0 Attachments: LUCENE-5636.patch I thought I handled it in LUCENE-5246, but turns out I didn't handle it fully. I'll upload a patch which improves the test to expose the bug. I know where it is, but I'm not sure how to fix it without breaking index back-compat. Can we do that on experimental features? The problem is that if you update different fields in different gens, the FieldInfos files of older gens remain referenced (still!!). I open a new issue since LUCENE-5246 is already resolved and released, so don't want to mess up our JIRA... The severity of the bug is that unneeded files are still referenced in the index. Everything still works correctly, it's just that .fnm files are still there. But as I wrote, I'm still not sure how to solve it without requiring apps that use dv updates to reindex. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6104) The 'addreplica' Collection API does not support async parameter
Shalin Shekhar Mangar created SOLR-6104: --- Summary: The 'addreplica' Collection API does not support async parameter Key: SOLR-6104 URL: https://issues.apache.org/jira/browse/SOLR-6104 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8.1, 4.8 Reporter: Shalin Shekhar Mangar Fix For: 4.9, 5.0 The 'addreplica' API does not support an 'async' parameter which was added by SOLR-5477. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6102) The 'addreplica' Collection API does not support property params
[ https://issues.apache.org/jira/browse/SOLR-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-6102. - Resolution: Invalid Oops, looks like I was too quick in opening this issue. The 'addreplica' API does support setting core properties but it is not documented in the Solr reference guide. The 'addreplica' Collection API does not support property params Key: SOLR-6102 URL: https://issues.apache.org/jira/browse/SOLR-6102 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8, 4.8.1 Reporter: Shalin Shekhar Mangar Fix For: 4.9, 5.0 All Collection APIs except 'addreplica', support passing core properties in the property.XXX format. Such property params are passed directly the core admin APIs invoked by these collection APIs. Not supporting these params is a bug and we should fix it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5683) Improve SegmentReader.getXXXDocValues
[ https://issues.apache.org/jira/browse/LUCENE-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004742#comment-14004742 ] Shai Erera commented on LUCENE-5683: I implemented it, many tests fail in CheckIndex on ClassCastException. So this is the current code: {code} FieldInfo fi = getDVField(field, DocValuesType.BINARY); if (fi == null) { return null; } MapString,Object dvFields = docValuesLocal.get(); BinaryDocValues dvs = (BinaryDocValues) dvFields.get(field); if (dvs == null) { // initialize ... } {code} And I changed it so that the FieldInfo part is inside the {{if}} (lazily initialize). The reason for the ClassCastException is that if you previously asked for a NUMERIC field w/ same name, it got into the map, therefore the code happily tries to case it to a NumericDocValues, or BinaryDocValues and hits the exception. So I'm not sure this optimization is right .. but also that it's worth complicating the code w/ e.g. instanceof checks? Improve SegmentReader.getXXXDocValues - Key: LUCENE-5683 URL: https://issues.apache.org/jira/browse/LUCENE-5683 Project: Lucene - Core Issue Type: Improvement Reporter: Shai Erera Assignee: Shai Erera Today we do two hash lookups, where in most cases a single one is enough. E.g. SR.getNumericDocValues initializes the FieldInfo (first lookup in FieldInfos), however if that field was already initialized, we can simply check dvFields.get(). This can be improved in all getXXXDocValues as well as getDocsWithField. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5648) SolrCore#getStatistics() should nest open searchers' stats
[ https://issues.apache.org/jira/browse/SOLR-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shikhar Bhushan closed SOLR-5648. - Resolution: Invalid bq. 1) I'm not sure i really understand what this adds – isn't every registered searcher (which should include every open searcher if there are more then one) already listed in the infoRegistry (so it's stats are surfaced in /admin/mbeans and via JMX) ? you're right! that's much better. SolrCore#getStatistics() should nest open searchers' stats -- Key: SOLR-5648 URL: https://issues.apache.org/jira/browse/SOLR-5648 Project: Solr Issue Type: Task Reporter: Shikhar Bhushan Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-5648.patch, oldestSearcherStaleness.gif, openSearchers.gif {{SolrIndexSearcher}} leaks are a notable cause of garbage collection issues in codebases with custom components. So it is useful to be able to access monitoring information about what searchers are currently open, and in turn access their stats e.g. {{openedAt}}. This can be nested via {{SolrCore#getStatistics()}} which has a {{_searchers}} collection of all open searchers. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004774#comment-14004774 ] ASF subversion and git services commented on LUCENE-5675: - Commit 1596599 from [~mikemccand] in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596599 ] LUCENE-5675: go back to sending deleted docs to PostingsFormat on flush; move 'skip deleted docs' into IDVPF ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Reporter: Robert Muir Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004789#comment-14004789 ] ASF subversion and git services commented on LUCENE-5675: - Commit 1596602 from [~mikemccand] in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596602 ] LUCENE-5675: finish reverting 'do not send deleted docs to PostingsFormat on flush' ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Reporter: Robert Muir Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6105) DebugComponent NPE when single-pass distributed search is used
[ https://issues.apache.org/jira/browse/SOLR-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004796#comment-14004796 ] Shikhar Bhushan commented on SOLR-6105: --- paging [~shalinmangar] in case you have any idea what might be going on DebugComponent NPE when single-pass distributed search is used -- Key: SOLR-6105 URL: https://issues.apache.org/jira/browse/SOLR-6105 Project: Solr Issue Type: Bug Reporter: Shikhar Bhushan Priority: Minor I'm seeing NPE's in {{DebugComponent}} with debugQuery=true when just ID score are requested, which enables the single-pass distributed search optimization from SOLR-1880. The NPE originates on this line in DebugComponent.finishStage(): {noformat} int idx = sdoc.positionInResponse; {noformat} indicating an ID that is in the explain but missing in the resultIds. I'm afraid I haven't been able to reproduce this in {{DistributedQueryComponentOptimizationTest}}, but wanted to open this ticket in any case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6105) DebugComponent NPE when single-pass distributed search is used
Shikhar Bhushan created SOLR-6105: - Summary: DebugComponent NPE when single-pass distributed search is used Key: SOLR-6105 URL: https://issues.apache.org/jira/browse/SOLR-6105 Project: Solr Issue Type: Bug Reporter: Shikhar Bhushan Priority: Minor I'm seeing NPE's in {{DebugComponent}} with debugQuery=true when just ID score are requested, which enables the single-pass distributed search optimization from SOLR-1880. The NPE originates on this line in DebugComponent.finishStage(): {noformat} int idx = sdoc.positionInResponse; {noformat} indicating an ID that is in the explain but missing in the resultIds. I'm afraid I haven't been able to reproduce this in {{DistributedQueryComponentOptimizationTest}}, but wanted to open this ticket in any case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5693) don't write deleted documents on flush
Michael McCandless created LUCENE-5693: -- Summary: don't write deleted documents on flush Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6105) DebugComponent NPE when single-pass distributed search is used
[ https://issues.apache.org/jira/browse/SOLR-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004799#comment-14004799 ] Shikhar Bhushan commented on SOLR-6105: --- also paging [~vzhovtiuk] - presumably you're using this feature in your app. does debugQuery=true work ok for you? DebugComponent NPE when single-pass distributed search is used -- Key: SOLR-6105 URL: https://issues.apache.org/jira/browse/SOLR-6105 Project: Solr Issue Type: Bug Reporter: Shikhar Bhushan Priority: Minor I'm seeing NPE's in {{DebugComponent}} with debugQuery=true when just ID score are requested, which enables the single-pass distributed search optimization from SOLR-1880. The NPE originates on this line in DebugComponent.finishStage(): {noformat} int idx = sdoc.positionInResponse; {noformat} indicating an ID that is in the explain but missing in the resultIds. I'm afraid I haven't been able to reproduce this in {{DistributedQueryComponentOptimizationTest}}, but wanted to open this ticket in any case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004826#comment-14004826 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596606 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1596606 ] LUCENE-4236: add a new test for crazy corner cases of coord() handling clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush
[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004821#comment-14004821 ] Robert Muir commented on LUCENE-5693: - This only makes sense for postings though. How can we avoid writing deleted documents in: * stored fields and term vectors (which we arent flushing) * docvalues (we would need to remap ordinals) By writing them some places and not writing them other places, we open the possibility of extremely confusing corner cases and bugs. don't write deleted documents on flush -- Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004828#comment-14004828 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596607 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596607 ] LUCENE-4236: add a new test for crazy corner cases of coord() handling clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush
[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004835#comment-14004835 ] Shai Erera commented on LUCENE-5693: Today we apply the deletes (update the bitset) when a Reader is being requested. At that point, we have a SegmentReader at hand and we can resolve the delete-by-Term/Query to the actual doc IDs ... how would we do that while the segment is flushed? How do we know which documents were associated with {{Term t}}, while it was sent as a delete? When I worked on LUCENE-5189 (NumericDocValues update), I had the same thought -- why flush the original numeric value when the document has already been updated? But I had the same issue - which documents were affected by the update Term. don't write deleted documents on flush -- Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4236: Attachment: LUCENE-4236.patch Here's the patch. I think its ready. I committed the new test already to trunk/4.x. clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004850#comment-14004850 ] Arcadius Ahouansou commented on SOLR-5285: -- Thanks [~varunthacker] and all for the great work. [~hossman] Any chance this will get into 4.9? Thanks. Solr response format should support child Docs -- Key: SOLR-5285 URL: https://issues.apache.org/jira/browse/SOLR-5285 Project: Solr Issue Type: New Feature Reporter: Varun Thacker Fix For: 4.9, 5.0 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, javabin_backcompat_child_docs.bin Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery. I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format. [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would be the place to add childDocs to the response. Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are 1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index? 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer. Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6086) Replica active during Warming
[ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ludovic Boutros updated SOLR-6086: -- Attachment: SOLR-6086.patch I checked the differences in the logs and in the code. The problem occures when: - a node is restarted - Peer Sync failed (no /get handler for instance, should it become mandatory ?) - the node is already synced (nothing to replicate) or : - a node is restarted and this is the leader (I do not know if it only appends with a lonely leader...) - the node is already synced (nothing to replicate) For the first case, I think this is a side effect of the modification in SOLR-4965. If Peer Sync is succesfull, in the code an explicit commit is called. And there's a comment which says: {code:title=RecoveryStrategy.java|borderStyle=solid} // force open a new searcher core.getUpdateHandler().commit(new CommitUpdateCommand(req, false)); {code} This is not the case if Peer Sync failed. Just adding this line is enough to correct this issue. Here is a patch with a test which reproduce the problem and the correction (to be applied to the branch 4x). I am working on the second case. Replica active during Warming - Key: SOLR-6086 URL: https://issues.apache.org/jira/browse/SOLR-6086 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6.1, 4.8.1 Reporter: ludovic Boutros Attachments: SOLR-6086.patch At least with Solr 4.6.1, replica are considered as active during the warming process. This means that if you restart a replica or create a new one, queries will be send to this replica and the query will hang until the end of the warming process (If cold searchers are not used). You cannot add or restart a node silently anymore. I think that the fact that the replica is active is not a bad thing. But, the HttpShardHandler and the CloudSolrServer class should take the warming process in account. Currently, I have developped a new very simple component which check that a searcher is registered. I am also developping custom HttpShardHandler and CloudSolrServer classes which will check the warming process in addition to the ACTIVE status in the cluster state. This seems to be more a workaround than a solution but that's all I can do in this version. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6086) Replica active during Warming
[ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ludovic Boutros updated SOLR-6086: -- Affects Version/s: 4.8.1 Replica active during Warming - Key: SOLR-6086 URL: https://issues.apache.org/jira/browse/SOLR-6086 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6.1, 4.8.1 Reporter: ludovic Boutros Attachments: SOLR-6086.patch At least with Solr 4.6.1, replica are considered as active during the warming process. This means that if you restart a replica or create a new one, queries will be send to this replica and the query will hang until the end of the warming process (If cold searchers are not used). You cannot add or restart a node silently anymore. I think that the fact that the replica is active is not a bad thing. But, the HttpShardHandler and the CloudSolrServer class should take the warming process in account. Currently, I have developped a new very simple component which check that a searcher is registered. I am also developping custom HttpShardHandler and CloudSolrServer classes which will check the warming process in addition to the ACTIVE status in the cluster state. This seems to be more a workaround than a solution but that's all I can do in this version. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6086) Replica active during Warming
[ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004856#comment-14004856 ] ludovic Boutros edited comment on SOLR-6086 at 5/21/14 4:23 PM: I checked the differences in the logs and in the code. The problem occures when: - a node is restarted - Peer Sync failed (no /get handler for instance, should it become mandatory ?) - the node is already synced (nothing to replicate) or : - a node is restarted and this is the leader (I do not know if it only appends with a lonely leader...) - the node is already synced (nothing to replicate) For the first case, I think this is a side effect of the modification in SOLR-4965. If Peer Sync is succesfull, in the code an explicit commit is called. And there's a comment which says: {code:title=RecoveryStrategy.java|borderStyle=solid} // force open a new searcher core.getUpdateHandler().commit(new CommitUpdateCommand(req, false)); {code} This is not the case if Peer Sync failed. Just adding this line is enough to correct this issue. Here is a patch with a test which reproduces the problem and the correction (to be applied to the branch 4x). I am working on the second case. was (Author: lboutros): I checked the differences in the logs and in the code. The problem occures when: - a node is restarted - Peer Sync failed (no /get handler for instance, should it become mandatory ?) - the node is already synced (nothing to replicate) or : - a node is restarted and this is the leader (I do not know if it only appends with a lonely leader...) - the node is already synced (nothing to replicate) For the first case, I think this is a side effect of the modification in SOLR-4965. If Peer Sync is succesfull, in the code an explicit commit is called. And there's a comment which says: {code:title=RecoveryStrategy.java|borderStyle=solid} // force open a new searcher core.getUpdateHandler().commit(new CommitUpdateCommand(req, false)); {code} This is not the case if Peer Sync failed. Just adding this line is enough to correct this issue. Here is a patch with a test which reproduce the problem and the correction (to be applied to the branch 4x). I am working on the second case. Replica active during Warming - Key: SOLR-6086 URL: https://issues.apache.org/jira/browse/SOLR-6086 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6.1, 4.8.1 Reporter: ludovic Boutros Attachments: SOLR-6086.patch At least with Solr 4.6.1, replica are considered as active during the warming process. This means that if you restart a replica or create a new one, queries will be send to this replica and the query will hang until the end of the warming process (If cold searchers are not used). You cannot add or restart a node silently anymore. I think that the fact that the replica is active is not a bad thing. But, the HttpShardHandler and the CloudSolrServer class should take the warming process in account. Currently, I have developped a new very simple component which check that a searcher is registered. I am also developping custom HttpShardHandler and CloudSolrServer classes which will check the warming process in addition to the ACTIVE status in the cluster state. This seems to be more a workaround than a solution but that's all I can do in this version. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1146: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1146/ 1 tests failed. FAILED: org.apache.solr.cloud.HttpPartitionTest.testDistribSearch Error Message: No registered leader was found after waiting for 6ms , collection: c8n_1x3_lf slice: shard1 Stack Trace: org.apache.solr.common.SolrException: No registered leader was found after waiting for 6ms , collection: c8n_1x3_lf slice: shard1 at __randomizedtesting.SeedInfo.seed([7A06522654ACE583:FBE0DC3E23F385BF]:0) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:567) at org.apache.solr.cloud.HttpPartitionTest.testRf3WithLeaderFailover(HttpPartitionTest.java:348) at org.apache.solr.cloud.HttpPartitionTest.doTest(HttpPartitionTest.java:148) Build Log: [...truncated 54769 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:490: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:182: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/extra-targets.xml:77: Java returned: 1 Total time: 191 minutes 17 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004889#comment-14004889 ] Michael McCandless commented on LUCENE-4236: +1, this is a great cleanup: more understandable than what we have today. Maybe we should leave FilterScorer package private until there's a need for public? clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004890#comment-14004890 ] Robert Muir commented on LUCENE-4236: - Good idea. If we have a need somewhere else we can open it up. clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush
[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004902#comment-14004902 ] Michael McCandless commented on LUCENE-5693: bq. how would we do that while the segment is flushed? We do it in FreqProxTermsWriter.applyDeletes; since we know the terms to be deleted, and we have the BytesRefHash, it's easy. don't write deleted documents on flush -- Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush
[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004906#comment-14004906 ] Michael McCandless commented on LUCENE-5693: bq. This only makes sense for postings though. Right, postings is much easier than doc values. But postings are also the most costly to merge. bq. By writing them some places and not writing them other places, we open the possibility of extremely confusing corner cases and bugs. I disagree: I think we discover places that are relying on deleted docs behavior, i.e. test bugs. When I did this on LUCENE-5675 there were only a few places that relied on deleted docs. don't write deleted documents on flush -- Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6101) Shard splitting doesn't work in legacyCloud=false mode
[ https://issues.apache.org/jira/browse/SOLR-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6101: Attachment: SOLR-6101.patch Changes: # ShardSplitTest switches to using legacyCloud=false randomly # Shard splitting uses addReplica API to create replicas instead of using core admin create API directly. I had to introduce a wait loop for sub-shard to be created by overseer before we can call addReplica. Shard splitting doesn't work in legacyCloud=false mode -- Key: SOLR-6101 URL: https://issues.apache.org/jira/browse/SOLR-6101 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-6101.patch When we invoke splitshard Collection API against a cluster with legacyCloud=false, we get the following errors: {code} 2014-05-15 21:07:58,986 [Overseer-163819091268403216-ec2-x.compute-1.amazonaws.com:8986_solr-n_51] ERROR solr.cloud.OverseerCollectionProcessor - Collection splitshard of splitshard failed:org.apache.solr.common.SolrException: Could not find coreNodeName at org.apache.solr.cloud.OverseerCollectionProcessor.waitForCoreNodeName(OverseerCollectionProcessor.java:1504) at org.apache.solr.cloud.OverseerCollectionProcessor.splitShard(OverseerCollectionProcessor.java:1255) at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:472) at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:248) at java.lang.Thread.run(Thread.java:745) 2014-05-15 21:07:59,003 [Overseer-163819091268403216-ec2-xxx.compute-1.amazonaws.com:8986_solr-n_51] INFO solr.cloud.OverseerCollectionProcessor - Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-18 complete, response:{success={null={responseHeader={status=0,QTime=1}},null={responseHeader={status=0,QTime=1}}},split117278106116750={responseHeader={status=0,QTime=0},STATUS=failed,Response=Error CREATEing SolrCore '3M_shard1_1_replica1': non legacy mode coreNodeName missing shard=shard1_1name=3M_shard1_1_replica1action=CREATEcollection=3Mwt=javabinqt=/admin/coresasync=split117278106116750version=2},Operation splitshard caused exception:=org.apache.solr.common.SolrException: Could not find coreNodeName,exception={msg=Could not find coreNodeName,rspCode=500}} {code} The sub-shard replica (leader) creation fails due to: {code} { responseHeader: { status: 0, QTime: 0 }, STATUS: failed, Response: Error CREATEing SolrCore '3M_shard1_0_replica1': non legacy mode coreNodeName missing shard=shard1_0name=3M_shard1_0_replica1action=CREATEcollection=3Mwt=javabinqt=/admin/coresasync=split117278099904930version=2 } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004911#comment-14004911 ] Ryan Ernst commented on LUCENE-5650: +1, everything looks good to me (and test pass for me as well). createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5693) don't write deleted documents on flush
[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5693: --- Attachment: LUCENE-5693.patch Patch, decoupled from LUCENE-5675. Tests pass. The trickiest one was the new TestFieldCacheVsDocValues: it heavily relies on being able to read deleted docs from postings, which I think is invalid. I also had to fix CheckIndex to not verify term vectors for deleted docs; I think that's fair. The core fix is easy: FreqProxFields (passed to the PostingsWriterat flush) just skips the deleted docs. Also, this uncovered a bug in ToParentBJQ.explain's handling of deleted docs. don't write deleted documents on flush -- Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-5693.patch When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.
[ https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004999#comment-14004999 ] ASF subversion and git services commented on SOLR-5495: --- Commit 1596636 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1596636 ] SOLR-5495: Print cluster state in assertion failure messages if a leader cannot be found to determine root cause of HttpPartitionTest failures in Jenkins. Recovery strategy for leader partitioned from replica case. --- Key: SOLR-5495 URL: https://issues.apache.org/jira/browse/SOLR-5495 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Timothy Potter Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch We need to work out a strategy for the case of: Leader and replicas can still talk to ZooKeeper, Leader cannot talk to replica. We punted on this in initial design, but I'd like to get something in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5495) Recovery strategy for leader partitioned from replica case.
[ https://issues.apache.org/jira/browse/SOLR-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005009#comment-14005009 ] ASF subversion and git services commented on SOLR-5495: --- Commit 1596637 from [~thelabdude] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596637 ] SOLR-5495: Print cluster state in assertion failure messages if a leader cannot be found to determine root cause of HttpPartitionTest failures in Jenkins Recovery strategy for leader partitioned from replica case. --- Key: SOLR-5495 URL: https://issues.apache.org/jira/browse/SOLR-5495 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Timothy Potter Attachments: SOLR-5495.patch, SOLR-5495.patch, SOLR-5495.patch We need to work out a strategy for the case of: Leader and replicas can still talk to ZooKeeper, Leader cannot talk to replica. We punted on this in initial design, but I'd like to get something in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005037#comment-14005037 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596640 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1596640 ] LUCENE-4236: cleanup/optimize BooleanScorer in-order creation clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5285: --- Attachment: SOLR-5285.patch Hey Varun, I didn't get very far digging into your patch, because i started by looking at your new randomized test in SolrExampleTests and encountered some problems... 1) the first time i tried running your new randomized test, i got an NPE -- it didn't reproduce reliable though, because your test called new Random() instead of leveraging the test-framework (ant precommit will warn you about stuff like this) 2) Side note: there's no need to randomize which response parser is used when you add test methods to SolrExampleTests -- every method there gets picked up automatically by the subclasses which ensure they are all run with every writer/parser. 3) When started looking into fixing the use of random() in your test, I realized that the assertions in the test weren't very strong. What i was refering to in my earlier comment was having a test that attempted to use the transformer on a result set that included docs with children, and docs w/o children; and asserting that every child returned really was a decendent of the specified doc by comparing with what we _know_ for a fact we indexed -- your test wasn't really doing any of that. In the attached patch, i've overhauled {{SolrExampleTests.testChildDoctransformer()}} along the lines of what i was describing, but this has exposed a ClassCastException in the transformer. I haven't had a chance to dig into what's happening, but for some odd reason it only seems to manifest itself when the XML Response Writer is used... {noformat} hossman@frisbee:~/lucene/dev/solr/solrj$ ant test -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8 ... [junit4] 2 11768 T20 C1 oasc.SolrException.log ERROR null:java.lang.ClassCastException: org.apache.lucene.document.Field cannot be cast to java.lang.String [junit4] 2at org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformerFactory.java:142) [junit4] 2at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:254) [junit4] 2at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:172) [junit4] 2at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111) [junit4] 2at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:40) [junit4] 2at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:760) [junit4] 2at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:428) [junit4] 2at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) [junit4] 2at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136) [junit4] 2at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) [junit4] 2at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) [junit4] 2at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) [junit4] 2at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) [junit4] 2at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) [junit4] 2at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077) [junit4] 2at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) [junit4] 2at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) [junit4] 2at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) [junit4] 2at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) [junit4] 2at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) [junit4] 2at org.eclipse.jetty.server.Server.handle(Server.java:368) [junit4] 2at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) [junit4] 2at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) [junit4] 2at
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005070#comment-14005070 ] ASF subversion and git services commented on LUCENE-4236: - Commit 1596646 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596646 ] LUCENE-4236: cleanup/optimize BooleanScorer in-order creation clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4236. - Resolution: Fixed clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6088) Add query re-ranking with the ReRankingQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-6088: - Attachment: SOLR-6088.patch New patch with all tests and precommit passing. Add query re-ranking with the ReRankingQParserPlugin Key: SOLR-6088 URL: https://issues.apache.org/jira/browse/SOLR-6088 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Attachments: SOLR-6088.patch, SOLR-6088.patch, SOLR-6088.patch This ticket introduces the ReRankingQParserPlugin which adds query Reranking/Rescoring for Solr. It leverages the new RankQuery framework to plug-in the new Lucene QueryRescorer. See ticket LUCENE-5489 for details on the use case. Sample syntax: {code} q={!rerank mainQuery=$qq reRankQuery=$rqq reRankDocs=200} {code} In the example above the mainQuery is executed and 200 docs are collected and re-ranked based on the results of the reRankQuery. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit
[ https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005094#comment-14005094 ] Mikhail Khludnev commented on LUCENE-4236: -- [~rcmuir] great job! btw, i wonder if Solr is allowed to search with BooleanScorer (term-at-time)? clean up booleanquery conjunction optimizations a bit - Key: LUCENE-4236 URL: https://issues.apache.org/jira/browse/LUCENE-4236 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch After LUCENE-3505, I want to do a slight cleanup: * compute the term conjunctions optimization in scorer(), so its applied even if we have optional and prohibited clauses that dont exist in the segment (e.g. return null) * use the term conjunctions optimization when optional.size() == minShouldMatch, as that means they are all mandatory, too. * don't return booleanscorer1 when optional.size() == minShouldMatch, because it means we have required clauses and in general BS2 should do a much better job (e.g. use advance). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5468) Option to enforce a majority quorum approach to accepting updates in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005099#comment-14005099 ] ASF subversion and git services commented on SOLR-5468: --- Commit 1596652 from [~thelabdude] in branch 'dev/trunk' [ https://svn.apache.org/r1596652 ] SOLR-5468: Improve reporting of cluster state when assertions fail; to help diagnose cause of Jenkins failures. Option to enforce a majority quorum approach to accepting updates in SolrCloud -- Key: SOLR-5468 URL: https://issues.apache.org/jira/browse/SOLR-5468 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.5 Environment: All Reporter: Timothy Potter Assignee: Timothy Potter Priority: Minor Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch I've been thinking about how SolrCloud deals with write-availability using in-sync replica sets, in which writes will continue to be accepted so long as there is at least one healthy node per shard. For a little background (and to verify my understanding of the process is correct), SolrCloud only considers active/healthy replicas when acknowledging a write. Specifically, when a shard leader accepts an update request, it forwards the request to all active/healthy replicas and only considers the write successful if all active/healthy replicas ack the write. Any down / gone replicas are not considered and will sync up with the leader when they come back online using peer sync or snapshot replication. For instance, if a shard has 3 nodes, A, B, C with A being the current leader, then writes to the shard will continue to succeed even if B C are down. The issue is that if a shard leader continues to accept updates even if it loses all of its replicas, then we have acknowledged updates on only 1 node. If that node, call it A, then fails and one of the previous replicas, call it B, comes back online before A does, then any writes that A accepted while the other replicas were offline are at risk to being lost. SolrCloud does provide a safe-guard mechanism for this problem with the leaderVoteWait setting, which puts any replicas that come back online before node A into a temporary wait state. If A comes back online within the wait period, then all is well as it will become the leader again and no writes will be lost. As a side note, sys admins definitely need to be made more aware of this situation as when I first encountered it in my cluster, I had no idea what it meant. My question is whether we want to consider an approach where SolrCloud will not accept writes unless there is a majority of replicas available to accept the write? For my example, under this approach, we wouldn't accept writes if both BC failed, but would if only C did, leaving A B online. Admittedly, this lowers the write-availability of the system, so may be something that should be tunable? From Mark M: Yeah, this is kind of like one of many little features that we have just not gotten to yet. I’ve always planned for a param that let’s you say how many replicas an update must be verified on before responding success. Seems to make sense to fail that type of request early if you notice there are not enough replicas up to satisfy the param to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005092#comment-14005092 ] Dawid Weiss commented on LUCENE-5650: - Please commit it to trunk, Ryan! I'll be at work in ~9hours so should something pop up in jenkins runs I'll take care of these. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, dih.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5309) Investigate ShardSplitTest failures
[ https://issues.apache.org/jira/browse/SOLR-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005143#comment-14005143 ] ASF subversion and git services commented on SOLR-5309: --- Commit 1596661 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596661 ] SOLR-5309: Fix DUP.processDelete to route delete-by-id to one sub-shard only. Enable ShardSplitTest again. Investigate ShardSplitTest failures --- Key: SOLR-5309 URL: https://issues.apache.org/jira/browse/SOLR-5309 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Blocker Investigate why ShardSplitTest if failing sporadically. Some recent failures: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3328/ http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7760/ http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/861/ -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-5285: Attachment: SOLR-5285.patch Fixed the class cast exception. This passes for me now - {noformat} ant test -Dtests.method=testChildDoctransformer -Dtests.seed=720251997BEC4F70 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Pacific/Fiji -Dtests.file.encoding=UTF-8 {noformat} Also ran it over 20 times and it is passing. Solr response format should support child Docs -- Key: SOLR-5285 URL: https://issues.apache.org/jira/browse/SOLR-5285 Project: Solr Issue Type: New Feature Reporter: Varun Thacker Fix For: 4.9, 5.0 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, javabin_backcompat_child_docs.bin Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery. I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format. [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would be the place to add childDocs to the response. Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are 1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index? 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer. Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5648) Index/search multi-valued time durations
[ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5648: - Attachment: LUCENE-5648.patch Updated patch: * Support ranges like 2014 TO 2014-03 which is semantically the same thing as 2014-01 TO 2014-03. This means you can now do [* TO whatever]. * Parses calendar ranges. This means you can round-trip toString() and parseShape() wether it's a single Calendar value 2014-05 or a range [* TO 2013]. Index/search multi-valued time durations Key: LUCENE-5648 URL: https://issues.apache.org/jira/browse/LUCENE-5648 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations]. Ideally it would be easier to index durations, and work in a more optimal way. This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc. The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way. I'll expect to have a working patch up this week. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6103) Add DateRangeField
[ https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-6103: --- Attachment: SOLR-6103.patch I updated LUCENE-5608 with the date range parsing, and now I added the DateRangeField here with tests. Examples on how to index ranges are below. Note that they aren't necessarily explicit ranges, it can be implied by referring specifying a date instance to a desired granularity. It includes the same syntax Solr supports, though doesn't do DateMath. {noformat} [* TO *] 2014-05-21T12:00:00.000Z [2000 TO 2014-05-21] {noformat} By default, at search time the predicate is intersects, which means it'll match any overlap with an indexed date range. It can be specified with op as a local-param. {noformat} q=dateRange:2014-05-21 q={!field f=dateRange op=Contains v=[1999 TO 2001]} {noformat} I opted for this new op local-param instead of using Lucene-spatial's awkward SpatialArgsParser format which looks like Intersects(foo). Add DateRangeField -- Key: SOLR-6103 URL: https://issues.apache.org/jira/browse/SOLR-6103 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-6103.patch LUCENE-5648 introduced a date range index search capability in the spatial module. This issue is for a corresponding Solr FieldType to be named DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that parses a superset of Solr's strict date format. It also parses partial dates (e.g.: 2014-10 has month specificity), and the trailing 'Z' is optional, and a leading +/- may be present (minus indicates BC era), and * means all-time. The proposed field type would use it to parse a string and also both ends of a range query, but furthermore it will also allow an arbitrary range query of the form {{calspec TO calspec}} such as: {noformat}2000 TO 2014-05-21T10{noformat} Which parses as the year 2000 thru 2014 May 21st 10am (GMT). I suggest this syntax because it is aligned with Lucene's range query syntax. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005264#comment-14005264 ] Jessica Cheng commented on SOLR-6091: - [~noble.paul] Do you mean you still see race condition with this implementation (wrong overseer quitting), or do you mean that you have caught the race condition in your cluster? Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5285: --- Attachment: SOLR-5285.patch Hey Varun, I'd started looking ~ChildDocTransformerFactory.java:142 before i saw your new patch -- comparing the old code with the new code it still seems like this is more brittle than it needs to be (particularly in cases where the uniqueKey field type isn't a string -- ie: a TrieIntField) I've attached an update that eliminates (most) of that brittle casting code to rely on the FieldType methods instead ... i still want to review the rest of the patch in more depth, but i wanted to go ahead and attach this update ASAP so you could take a look (and because i'm not sure how much more patch reviewing time i'll get in before i leave town tomorrow) Solr response format should support child Docs -- Key: SOLR-5285 URL: https://issues.apache.org/jira/browse/SOLR-5285 Project: Solr Issue Type: New Feature Reporter: Varun Thacker Fix For: 4.9, 5.0 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, javabin_backcompat_child_docs.bin Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery. I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format. [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would be the place to add childDocs to the response. Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are 1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index? 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer. Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5468) Option to enforce a majority quorum approach to accepting updates in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005319#comment-14005319 ] ASF subversion and git services commented on SOLR-5468: --- Commit 1596703 from [~thelabdude] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596703 ] SOLR-5468: report replication factor that was achieved for an update request if requested by the client application; port from trunk Option to enforce a majority quorum approach to accepting updates in SolrCloud -- Key: SOLR-5468 URL: https://issues.apache.org/jira/browse/SOLR-5468 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.5 Environment: All Reporter: Timothy Potter Assignee: Timothy Potter Priority: Minor Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch I've been thinking about how SolrCloud deals with write-availability using in-sync replica sets, in which writes will continue to be accepted so long as there is at least one healthy node per shard. For a little background (and to verify my understanding of the process is correct), SolrCloud only considers active/healthy replicas when acknowledging a write. Specifically, when a shard leader accepts an update request, it forwards the request to all active/healthy replicas and only considers the write successful if all active/healthy replicas ack the write. Any down / gone replicas are not considered and will sync up with the leader when they come back online using peer sync or snapshot replication. For instance, if a shard has 3 nodes, A, B, C with A being the current leader, then writes to the shard will continue to succeed even if B C are down. The issue is that if a shard leader continues to accept updates even if it loses all of its replicas, then we have acknowledged updates on only 1 node. If that node, call it A, then fails and one of the previous replicas, call it B, comes back online before A does, then any writes that A accepted while the other replicas were offline are at risk to being lost. SolrCloud does provide a safe-guard mechanism for this problem with the leaderVoteWait setting, which puts any replicas that come back online before node A into a temporary wait state. If A comes back online within the wait period, then all is well as it will become the leader again and no writes will be lost. As a side note, sys admins definitely need to be made more aware of this situation as when I first encountered it in my cluster, I had no idea what it meant. My question is whether we want to consider an approach where SolrCloud will not accept writes unless there is a majority of replicas available to accept the write? For my example, under this approach, we wouldn't accept writes if both BC failed, but would if only C did, leaving A B online. Admittedly, this lowers the write-availability of the system, so may be something that should be tunable? From Mark M: Yeah, this is kind of like one of many little features that we have just not gotten to yet. I’ve always planned for a param that let’s you say how many replicas an update must be verified on before responding success. Seems to make sense to fail that type of request early if you notice there are not enough replicas up to satisfy the param to begin with. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005348#comment-14005348 ] ASF subversion and git services commented on LUCENE-5675: - Commit 1596708 from [~mikemccand] in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1596708 ] LUCENE-5675: working on ant precommit ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Reporter: Robert Muir Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5285) Solr response format should support child Docs
[ https://issues.apache.org/jira/browse/SOLR-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005349#comment-14005349 ] Hoss Man commented on SOLR-5285: {quote} bq. Why is the tag name in the JSON format childDocs but in the XML format it's childDoc (no plural) ? ... seems like those should be consistent. I guess because in JSON the input is a JSON array hence childDocs, while in XML we use multiple childDoc tags to represent nested documents. {quote} That makes sense -- but now has me thinking back to the proposed usage in your earliest comment on this issue: why create a new {{childDoc}} element in the XML at all? why not just re-use {{doc}} (nested inside the existing {{doc}}) ... that seems like the most straight forward solution, and from what i can tell, that would probably simplify the changes to XMLResponseParser.java as well wouldn't it? speaking of which -- i don't understand the need for changing the method sig for {{XMLResponseParser.readDocument}} ... why can't the method construct the SolrDocument objects itself? bq. Added a non mandatory parameter called numChildDocs which makes it configurable. Although I'm not sure if the name is correct. hmmm, yeah ... for consistency with the top level query we could use something like rows but the risk for confusion there seems like it outweighs the consistency factor. how about limit ? bq. Added a non mandatory parameter called childFilter ... look good ... in general ChildDocTransformerFactory looks pretty good to me now -- although I just noticed a typo in the SolrException msg if parentFilter is null ... it refers to which -- but that doesn't apply here. bq. 2. Created a new binary file for backcompatibility and forwardcompatibility. I might be missing something, buti don't think {{testBackCompatForSolrDocumentWithChildDocs}} is actually asserting anything related to the child docs -- because it uses {{assertSolrDocumentEquals}}, but that method hasn't been updated to know about child docs, has it? To sum up: * In general, i think the current patch looks great * remaining concerns about implementation: ** {{testBackCompatForSolrDocumentWithChildDocs}} doesn't seem valid to me w/o changes to {{assertSolrDocumentEquals}} ** err msg typo in {{ChildDocTransformerFactory}} needs fixed ** method sig change in {{XMLResponseParser.readDocument}} seems unneccesasary * remaining questions about the API: ** better name for {{numChildDocs}} ? ... how about {{limit}} ? ** why use {{childDoc}} in XML instead of {{doc}} ? Solr response format should support child Docs -- Key: SOLR-5285 URL: https://issues.apache.org/jira/browse/SOLR-5285 Project: Solr Issue Type: New Feature Reporter: Varun Thacker Fix For: 4.9, 5.0 Attachments: SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, SOLR-5285.patch, javabin_backcompat_child_docs.bin Solr has added support for taking childDocs as input ( only XML till now ). It's currently used for BlockJoinQuery. I feel that if a user indexes a document with child docs, even if he isn't using the BJQ features and is just searching which results in a hit on the parentDoc, it's childDocs should be returned in the response format. [~hossman_luc...@fucit.org] on IRC suggested that the DocTransformers would be the place to add childDocs to the response. Now given a docId one needs to find out all the childDoc id's. A couple of approaches which I could think of are 1. Maintain the relation between a parentDoc and it's childDocs during indexing time in maybe a separate index? 2. Somehow emulate what happens in ToParentBlockJoinQuery.nextDoc() - Given a parentDoc it finds out all the childDocs but this requires a childScorer. Am I missing something obvious on how to find the relation between a parentDoc and it's childDocs because none of the above solutions for this look right. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6106) Sometimes all the cores on a SolrCloud node cannot find their config when intializing the ManagedResourceStorage storageIO impl
Timothy Potter created SOLR-6106: Summary: Sometimes all the cores on a SolrCloud node cannot find their config when intializing the ManagedResourceStorage storageIO impl Key: SOLR-6106 URL: https://issues.apache.org/jira/browse/SOLR-6106 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Timothy Potter Assignee: Timothy Potter Priority: Minor Had one of my many nodes have problems initializing all cores due to the following problem. It was resolved by restarting the node (hence the minor classification). 2014-05-21 20:39:17,898 [coreLoadExecutor-4-thread-27] ERROR solr.core.CoreContainer - Unable to create core: small46_shard1_replica1 org.apache.solr.common.SolrException: Could not find config name for collection:small46 at org.apache.solr.core.SolrCore.init(SolrCore.java:858) at org.apache.solr.core.SolrCore.init(SolrCore.java:641) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.solr.common.SolrException: Could not find config name for collection:small46 at org.apache.solr.rest.ManagedResourceStorage.newStorageIO(ManagedResourceStorage.java:99) at org.apache.solr.core.SolrCore.initRestManager(SolrCore.java:2339) at org.apache.solr.core.SolrCore.init(SolrCore.java:845) ... 10 more -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush
[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005450#comment-14005450 ] Robert Muir commented on LUCENE-5693: - {quote} I disagree: I think we discover places that are relying on deleted docs behavior, i.e. test bugs. When I did this on LUCENE-5675 there were only a few places that relied on deleted docs. {quote} That's not the complexity i'm concerned about. I'm talking about bugs in lucene itself because shit like the following happens: * various codec apis unable to cope with writing 0 doc segments because all the docs were deleted * various codec apis with corner case bugs because stuff like 'maxdoc' in segmentinfo they are fed is inconsistent with what they saw. * various index/search apis unable to cope with docid X appears in codec api Y but not codec api Z where its expected to exist. * slow O(n) passes thru indexwriter apis to recalculate and reshuffle ordinals and stuff like that. * corner case bugs like incorrect statistics. * additional complexity inside indexwriter/codecs to handle this, when just merging away would be better. So if we want to rename the issue to as a special case, don't write deleted postings on flush and remove the TODO about changing this for things like DV, then I'm fine. But otherwise, if this is intended to be a precedent of how things should work, then I strongly feel we should not do this. The additional complexity and corner cases are simply not worth it. don't write deleted documents on flush -- Key: LUCENE-5693 URL: https://issues.apache.org/jira/browse/LUCENE-5693 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-5693.patch When we flush a new segment, sometimes some documents are born deleted, e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed documents. We already compute the liveDocs on flush, but then we continue (wastefully) to send those known-deleted documents to all Codec parts. I started to implement this on LUCENE-5675 but it was too controversial. Also, I expect typically the number of deleted docs is 0, or small, so not writing born deleted docs won't be much of a win for most apps. Still it seems silly to write them, consuming IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5648) Index/search multi-valued time durations
[ https://issues.apache.org/jira/browse/LUCENE-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005576#comment-14005576 ] David Smiley commented on LUCENE-5648: -- I was putting some thought into the different ways of indexing durations, listing pros cons. The approach here should work very well but it has two main down-sides of note: * Overlapping or adjacent ranges are effectively coalesced, which impacts the semantics of Contains Within. To be clear, it's a non-issue if the multiple durations for a given field on a document don't touch. But if you wanted to index say \[2000 TO 2014] and \[2006 TO 2007] then it's as if the 2nd range doesn't even exist. The document won't match for IsWithin a query of \[2006-2008]. * The worst-case number of terms generated for a range at index-time is pretty high. If you wanted to index Long.MIN_VALUE+1 TO Long.MAX_VALUE-1 (which spans hundreds of millions of years), we're talking about 14k terms(*). But it's certainly not commonly that bad unless you were indexing random milliseconds at random millennia. Indexing a 2 adjacent month duration in the same year is only 7 terms. At search time, lots of hypothetical terms in a duration isn't an issue for RPTs algorithms for the common case of a sparsely populated term space. Interestingly, using a 2D prefix-tree for single-dimensional durations expressed as points doesn't have these shortcomings. But that approach is slower to search than this approach (more possible terms in a search area; it's half of the square of the number of terms in this 1D tree), and is not amenable to terms-enumeration style interval faceting that I'll be doing next. (*) The number of terms currently being generated would be cut by ~40-50% once LUCENE-4942 gets done. Index/search multi-valued time durations Key: LUCENE-5648 URL: https://issues.apache.org/jira/browse/LUCENE-5648 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Attachments: LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch, LUCENE-5648.patch If you need to index a date/time duration, then the way to do that is to have a pair of date fields; one for the start and one for the end -- pretty straight-forward. But if you need to index a variable number of durations per document, then the options aren't pretty, ranging from denormalization, to joins, to using Lucene spatial with 2D as described [here|http://wiki.apache.org/solr/SpatialForTimeDurations]. Ideally it would be easier to index durations, and work in a more optimal way. This issue implements the aforementioned feature using Lucene-spatial with a new single-dimensional SpatialPrefixTree implementation. Unlike the other two SPT implementations, it's not based on floating point numbers. It will have a Date based customization that indexes levels at meaningful quantities like seconds, minutes, hours, etc. The point of that alignment is to make it faster to query across meaningful ranges (i.e. [2000 TO 2014]) and to enable a follow-on issue to facet on the data in a really fast way. I'll expect to have a working patch up this week. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/ibm-j9-jdk7) - Build # 10236 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10236/ Java: 32bit/ibm-j9-jdk7 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 1 tests failed. REGRESSION: org.apache.lucene.document.TestLazyDocument.testLazy Error Message: read past EOF: SlicedIndexInput(SlicedIndexInput(_0.tis in RAMInputStream(name=_0.cfs)) in RAMInputStream(name=_0.cfs) slice=2021238:3239819) Stack Trace: java.io.EOFException: read past EOF: SlicedIndexInput(SlicedIndexInput(_0.tis in RAMInputStream(name=_0.cfs)) in RAMInputStream(name=_0.cfs) slice=2021238:3239819) at __randomizedtesting.SeedInfo.seed([B526C0B1365A2211:84FFCDE38606F6BA]:0) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:265) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:120) at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218) at org.apache.lucene.store.MockIndexInputWrapper.readVInt(MockIndexInputWrapper.java:161) at org.apache.lucene.codecs.lucene3x.TermBuffer.read(TermBuffer.java:61) at org.apache.lucene.codecs.lucene3x.SegmentTermEnum.next(SegmentTermEnum.java:142) at org.apache.lucene.codecs.lucene3x.SegmentTermEnum.scanTo(SegmentTermEnum.java:175) at org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:282) at org.apache.lucene.codecs.lucene3x.TermInfosReader.get(TermInfosReader.java:207) at org.apache.lucene.codecs.lucene3x.TermInfosReader.terms(TermInfosReader.java:352) at org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.reset(Lucene3xFields.java:687) at org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTerms.iterator(Lucene3xFields.java:180) at org.apache.lucene.index.TermContext.build(TermContext.java:94) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:165) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.AssertingIndexSearcher.createNormalizedWeight(AssertingIndexSearcher.java:59) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) at org.apache.lucene.document.TestLazyDocument.testLazy(TestLazyDocument.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at
Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/ibm-j9-jdk7) - Build # 10236 - Failure!
This won't reproduce on an oracle JVM: I think its a j9 bug? Can we update our J9s in jenkins? Looks like there are new ones available with lots of fixes. On Wed, May 21, 2014 at 11:38 PM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/10236/ Java: 32bit/ibm-j9-jdk7 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 1 tests failed. REGRESSION: org.apache.lucene.document.TestLazyDocument.testLazy Error Message: read past EOF: SlicedIndexInput(SlicedIndexInput(_0.tis in RAMInputStream(name=_0.cfs)) in RAMInputStream(name=_0.cfs) slice=2021238:3239819) Stack Trace: java.io.EOFException: read past EOF: SlicedIndexInput(SlicedIndexInput(_0.tis in RAMInputStream(name=_0.cfs)) in RAMInputStream(name=_0.cfs) slice=2021238:3239819) at __randomizedtesting.SeedInfo.seed([B526C0B1365A2211:84FFCDE38606F6BA]:0) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:265) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:120) at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218) at org.apache.lucene.store.MockIndexInputWrapper.readVInt(MockIndexInputWrapper.java:161) at org.apache.lucene.codecs.lucene3x.TermBuffer.read(TermBuffer.java:61) at org.apache.lucene.codecs.lucene3x.SegmentTermEnum.next(SegmentTermEnum.java:142) at org.apache.lucene.codecs.lucene3x.SegmentTermEnum.scanTo(SegmentTermEnum.java:175) at org.apache.lucene.codecs.lucene3x.TermInfosReader.seekEnum(TermInfosReader.java:282) at org.apache.lucene.codecs.lucene3x.TermInfosReader.get(TermInfosReader.java:207) at org.apache.lucene.codecs.lucene3x.TermInfosReader.terms(TermInfosReader.java:352) at org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.reset(Lucene3xFields.java:687) at org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTerms.iterator(Lucene3xFields.java:180) at org.apache.lucene.index.TermContext.build(TermContext.java:94) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:165) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.AssertingIndexSearcher.createNormalizedWeight(AssertingIndexSearcher.java:59) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) at org.apache.lucene.document.TestLazyDocument.testLazy(TestLazyDocument.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) at java.lang.reflect.Method.invoke(Method.java:619) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1592 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1592/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 11216 lines...] [junit4] JVM J0: stderr was not empty, see: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140522_042326_570.syserr [junit4] JVM J0: stderr (verbatim) [junit4] java(215,0x134ae9000) malloc: *** error for object 0x134bd8320: pointer being freed was not allocated [junit4] *** set a breakpoint in malloc_error_break to debug [junit4] JVM J0: EOF [...truncated 1 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/bin/java -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=274640F91B914DCD -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -Dtests.leaveTemporary=false -Dtests.filterstacks=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 -classpath