[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618 ] Forest Soup edited comment on SOLR-6359 at 12/17/14 7:59 AM: - The numRecordsToKeep and maxNumLogsToKeep values should be in the updateLog., like below. Right? !-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). dir - the target directory for transaction logs, defaults to the solr data directory. -- updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): And where should I set the numRecordsToKeep and maxNumLogsToKeep values? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6816) Review SolrCloud Indexing Performance.
[ https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249639#comment-14249639 ] Per Steffensen commented on SOLR-6816: -- I believe today overwrite=false will not prevent neither document-version-check on leader (it will in the Solr we use in my company, but not in Apache Solr) nor bucket-version-check on non-leaders. As far as I can see {{DistributedUpdateProcessor.versionAdd}} will do document-version-check if versionsStored=true, leaderLogic=true and versionOnUpdate != 0. It will do bucket-version-check if versionsStored=true and leaderLogic=false. This has nothing to do with overwrite param. This version-check is not only for add-commands but also for delete-commands. The overwrite param controls only (in {{DirectUpdateHandler2}}) if you make sure to delete an existing document with the same id, before you add the new document. You do that by default, but if overwrite=false you just add the new document, allowing duplicates (defined to be documents that have the same id-value). So as far as I read the code, document-version-check will only be performed on leaders. Non-leaders will only do bucket-version-check, and I do not think that is expensive? As I said our version of Solr does not do document-version-check if overwrite=false. I think you should introduce that as well. But besides that, whats left to do in this area? What did I not understand? Review SolrCloud Indexing Performance. -- Key: SOLR-6816 URL: https://issues.apache.org/jira/browse/SOLR-6816 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Mark Miller Priority: Critical Attachments: SolrBench.pdf We have never really focused on indexing performance, just correctness and low hanging fruit. We need to vet the performance and try to address any holes. Note: A common report is that adding any replication is very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-5951: -- Assignee: Michael McCandless Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5951: --- Attachment: LUCENE-5951.patch Patch w/ tests. After I told Rob it's impossible to detect if a Path is backed by an SSD with pure Java, he of course went and did it ;) I added his isSSD method to IOUtils: it's a rough, Linux-only (for now) method to determine if a Path is backed by an SSD (thank you Rob!). Then I fixed CMS to have dynamic defaults, so that the first time merge is invoked, it checks the writer's directory. If it's on an SSD, it uses the pre LUCENE-4661 defaults (good for SSDs), else it uses the current defaults (good for spinning disks). It also logs this to infoStream so we can use that to see what it did. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618 ] Forest Soup edited comment on SOLR-6359 at 12/17/14 10:01 AM: -- The numRecordsToKeep and maxNumLogsToKeep values should be in the updateLog.like below. !-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). dir - the target directory for transaction logs, defaults to the solr data directory. -- updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog was (Author: forest_soup): The numRecordsToKeep and maxNumLogsToKeep values should be in the updateLog., like below. Right? !-- Enables a transaction log, used for real-time get, durability, and and solr cloud replica recovery. The log can grow as big as uncommitted changes to the index, so use of a hard autoCommit is recommended (see below). dir - the target directory for transaction logs, defaults to the solr data directory. -- updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box
[ https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249671#comment-14249671 ] Michael McCandless commented on LUCENE-6117: +1, thanks Rob! infostream is currently unusable out of box --- Key: LUCENE-6117 URL: https://issues.apache.org/jira/browse/LUCENE-6117 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-6117.patch testpoints used to only be emitted by assertions (still sketchy), but now are emitted always. I assume this is due to the change to support running tests with assertions disabled. we should try to clean this up, simple stuff like this is now useless: {code} indexWriterConfig.setInfoStream(System.out); // causes massive flooding like this: // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start {code} I hit this several times today just trying to do benchmarks and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6816) Review SolrCloud Indexing Performance.
[ https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249673#comment-14249673 ] Per Steffensen commented on SOLR-6816: -- Those of you that have been following my comments on misc issues will know that I like separation of concerns. So in our version of Solr all this decision-making on when to do document-version-check, when to delete existing documents with same id-value etc is isolated in {{enum UpdateSemanticsMode}} - see https://issues.apache.org/jira/secure/attachment/12553312/SOLR-3173_3178_3382_3428_plus.patch. We support different modes that makes slightly different decisions on the above topics, which is the reason for using an enum. You do not need that, because you only have one mode, but that should not prevent you from separating the decision-making concern. The patch is not entirely up to date with what we do today, but at least it illustrates the separation of concerns. {{DistributedUpdateHandler}} deals with a million concerns, so maybe you want to adopt that idea and move the code making the decisions out of {{DistributedUpdateHandler}}. Review SolrCloud Indexing Performance. -- Key: SOLR-6816 URL: https://issues.apache.org/jira/browse/SOLR-6816 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Mark Miller Priority: Critical Attachments: SolrBench.pdf We have never really focused on indexing performance, just correctness and low hanging fruit. We need to vet the performance and try to address any holes. Note: A common report is that adding any replication is very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6816) Review SolrCloud Indexing Performance.
[ https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249673#comment-14249673 ] Per Steffensen edited comment on SOLR-6816 at 12/17/14 10:17 AM: - Those of you that have been following my comments on misc issues will know that I like separation of concerns. So in our version of Solr all this decision-making on when to do document-version-check, when to delete existing documents with same id-value etc is isolated in {{enum UpdateSemanticsMode}} - see https://issues.apache.org/jira/secure/attachment/12553312/SOLR-3173_3178_3382_3428_plus.patch. We support different modes that makes slightly different decisions on the above topics, which is the reason for using an enum. You do not need that, because you only have one mode, but that should not prevent you from separating the decision-making concern. The patch is not entirely up to date with what we do today, but at least it illustrates the separation of concerns. {{DistributedUpdateHandler}} deals with a million concerns, so maybe you want to adopt that idea and move the code making the decisions out of {{DistributedUpdateHandler}}. Only mention this because I sense that at least [~shalinmangar] agrees that some cleanup (a.o. of {{DistributedUpdateHandler}}) is required: https://twitter.com/shalinmangar/status/543874893549277184 was (Author: steff1193): Those of you that have been following my comments on misc issues will know that I like separation of concerns. So in our version of Solr all this decision-making on when to do document-version-check, when to delete existing documents with same id-value etc is isolated in {{enum UpdateSemanticsMode}} - see https://issues.apache.org/jira/secure/attachment/12553312/SOLR-3173_3178_3382_3428_plus.patch. We support different modes that makes slightly different decisions on the above topics, which is the reason for using an enum. You do not need that, because you only have one mode, but that should not prevent you from separating the decision-making concern. The patch is not entirely up to date with what we do today, but at least it illustrates the separation of concerns. {{DistributedUpdateHandler}} deals with a million concerns, so maybe you want to adopt that idea and move the code making the decisions out of {{DistributedUpdateHandler}}. Review SolrCloud Indexing Performance. -- Key: SOLR-6816 URL: https://issues.apache.org/jira/browse/SOLR-6816 Project: Solr Issue Type: Task Components: SolrCloud Reporter: Mark Miller Priority: Critical Attachments: SolrBench.pdf We have never really focused on indexing performance, just correctness and low hanging fruit. We need to vet the performance and try to address any holes. Note: A common report is that adding any replication is very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6118) Improve efficiency of the history structure for filter caching
Adrien Grand created LUCENE-6118: Summary: Improve efficiency of the history structure for filter caching Key: LUCENE-6118 URL: https://issues.apache.org/jira/browse/LUCENE-6118 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor The filter caching uses a ring buffer that tracks frequencies of the hashcodes of the most-recently used filters. However it is based on an ArrayDequeInteger and a HashMapInteger which keep on (un)wrapping ints. Since the data-structure is very simple, we could try to do something better... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6640) ChaosMonkeySafeLeaderTest failure with CorruptIndexException
[ https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249708#comment-14249708 ] Shalin Shekhar Mangar commented on SOLR-6640: - I am looking at this failure too and I see another bug. I was wondering why did the replica have these writes in the first place considering that it hadn't recovery on startup wasn't complete yet. # RecoveryStrategy publishes the state of the replica as 'recovering' before it sets the update log to buffering mode which is why the leader sends updates to this replica that affect the index. # The test itself doesn't wait for a steady state e.g. by calling waitForRecovery or waitForThingsToLevelOut before starting the indexing threads. This is probably a good thing because that's what has helped us find this problem. # Shouldn't the peersync also be done while update log is set to buffering mode? {quote} So it's these files which are not getting removed when we do IW.rollback that were causing the problem - _0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx I am yet to figure out whether these files should have been removed by IW.rollback() or not? {quote} These files hang around because an IndexReader is open using the IndexWriter due to soft commit(s). ChaosMonkeySafeLeaderTest failure with CorruptIndexException Key: SOLR-6640 URL: https://issues.apache.org/jira/browse/SOLR-6640 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 5.0 Reporter: Shalin Shekhar Mangar Fix For: 5.0 Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, SOLR-6640.patch, SOLR-6640.patch Test failure found on jenkins: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/ {code} 1 tests failed. REGRESSION: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch Error Message: shard2 is not consistent. Got 62 from http://127.0.0.1:57436/collection1lastClient and got 24 from http://127.0.0.1:53065/collection1 Stack Trace: java.lang.AssertionError: shard2 is not consistent. Got 62 from http://127.0.0.1:57436/collection1lastClient and got 24 from http://127.0.0.1:53065/collection1 at __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234) at org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) {code} Cause of inconsistency is: {code} Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv))) [junit4] 2 at org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259) [junit4] 2 at org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88) [junit4] 2 at org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64) [junit4] 2 at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:102) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6118) Improve efficiency of the history structure for filter caching
[ https://issues.apache.org/jira/browse/LUCENE-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-6118: - Attachment: LUCENE-6118.patch Here is a patch. No more java.lang.Integers and 22 bytes per entry on average (4 for the ring buffer and 18 for the bag that tracks frequencies). Improve efficiency of the history structure for filter caching -- Key: LUCENE-6118 URL: https://issues.apache.org/jira/browse/LUCENE-6118 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6118.patch The filter caching uses a ring buffer that tracks frequencies of the hashcodes of the most-recently used filters. However it is based on an ArrayDequeInteger and a HashMapInteger which keep on (un)wrapping ints. Since the data-structure is very simple, we could try to do something better... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6640) ChaosMonkeySafeLeaderTest failure with CorruptIndexException
[ https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249708#comment-14249708 ] Shalin Shekhar Mangar edited comment on SOLR-6640 at 12/17/14 10:59 AM: I am looking at this failure too and I see another bug. I was wondering why the replica had these writes in the first place considering that recovery on startup had not completed. # RecoveryStrategy publishes the state of the replica as 'recovering' before it sets the update log to buffering mode which is why the leader sends updates to this replica that affect the index. # The test itself doesn't wait for a steady state e.g. by calling waitForRecovery or waitForThingsToLevelOut before starting the indexing threads. This is probably a good thing because that's what has helped us find this problem. # Shouldn't the peersync also be done while update log is set to buffering mode? {quote} So it's these files which are not getting removed when we do IW.rollback that were causing the problem - _0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx I am yet to figure out whether these files should have been removed by IW.rollback() or not? {quote} These files hang around because an IndexReader is open using the IndexWriter due to soft commit(s). was (Author: shalinmangar): I am looking at this failure too and I see another bug. I was wondering why did the replica have these writes in the first place considering that it hadn't recovery on startup wasn't complete yet. # RecoveryStrategy publishes the state of the replica as 'recovering' before it sets the update log to buffering mode which is why the leader sends updates to this replica that affect the index. # The test itself doesn't wait for a steady state e.g. by calling waitForRecovery or waitForThingsToLevelOut before starting the indexing threads. This is probably a good thing because that's what has helped us find this problem. # Shouldn't the peersync also be done while update log is set to buffering mode? {quote} So it's these files which are not getting removed when we do IW.rollback that were causing the problem - _0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx I am yet to figure out whether these files should have been removed by IW.rollback() or not? {quote} These files hang around because an IndexReader is open using the IndexWriter due to soft commit(s). ChaosMonkeySafeLeaderTest failure with CorruptIndexException Key: SOLR-6640 URL: https://issues.apache.org/jira/browse/SOLR-6640 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 5.0 Reporter: Shalin Shekhar Mangar Fix For: 5.0 Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, SOLR-6640.patch, SOLR-6640.patch Test failure found on jenkins: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/ {code} 1 tests failed. REGRESSION: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch Error Message: shard2 is not consistent. Got 62 from http://127.0.0.1:57436/collection1lastClient and got 24 from http://127.0.0.1:53065/collection1 Stack Trace: java.lang.AssertionError: shard2 is not consistent. Got 62 from http://127.0.0.1:57436/collection1lastClient and got 24 from http://127.0.0.1:53065/collection1 at __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234) at org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) {code} Cause of inconsistency is: {code} Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv))) [junit4] 2 at org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259) [junit4] 2 at org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88) [junit4] 2 at org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64) [junit4]
[jira] [Created] (LUCENE-6119) Add IndexWriter.getTotalNewBytesWritten
Michael McCandless created LUCENE-6119: -- Summary: Add IndexWriter.getTotalNewBytesWritten Key: LUCENE-6119 URL: https://issues.apache.org/jira/browse/LUCENE-6119 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk This method returns number of incoming bytes IW has written since it was opened, excluding merging. It tracks flushed segments, new commits (segments_N), incoming files/segments by addIndexes, newly written live docs / doc values updates files. It's an easy statistic for IW to track and should be useful to help applications more intelligently set defaults for IO throttling (RateLimiter). For example, an application that does hardly any indexing but finally triggered a large merge can afford to heavily throttle that large merge so it won't interfere with ongoing searches. But an application that's causing IW to write new bytes at 50 MB/sec must set a correspondingly higher IO throttling otherwise merges will clearly fall behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6119) Add IndexWriter.getTotalNewBytesWritten
[ https://issues.apache.org/jira/browse/LUCENE-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-6119: --- Attachment: LUCENE-6119.patch Simple patch + test. Add IndexWriter.getTotalNewBytesWritten --- Key: LUCENE-6119 URL: https://issues.apache.org/jira/browse/LUCENE-6119 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-6119.patch This method returns number of incoming bytes IW has written since it was opened, excluding merging. It tracks flushed segments, new commits (segments_N), incoming files/segments by addIndexes, newly written live docs / doc values updates files. It's an easy statistic for IW to track and should be useful to help applications more intelligently set defaults for IO throttling (RateLimiter). For example, an application that does hardly any indexing but finally triggered a large merge can afford to heavily throttle that large merge so it won't interfere with ongoing searches. But an application that's causing IW to write new bytes at 50 MB/sec must set a correspondingly higher IO throttling otherwise merges will clearly fall behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249722#comment-14249722 ] Alan Woodward commented on LUCENE-2878: --- Hi Rob, Thanks for helping out here! The additional branches are there to allow for returning NO_MORE_POSITIONS once the positions are exhausted, but maybe I should move that logic up to TermScorer instead and put the assertions back in for the PostingsReaders. Merging TermsEnum.docs() and TermsEnum.docsAndPositions() might be tricky because their API is different - docs() never returns null, while docsAndPositions() will return null if the relevant postings data isn't indexed. Although having said that, I'm probably already breaking that contract by redirecting from one to the other with the flags check. I'll fix TermScorer. I haven't nuked Spans yet, mainly because I think we should probably keep them (as deprecated) in 5.0, and remove them only in trunk. It would also make the patch bigger :-) I changed existing test files rather than adding any new ones, apart from the tests exercising the PositionFilterQueries. Maybe a way to reduce the size of the patch would be to remove the PositionFilterQueries from this issue and create a new one for them? Then this one is just about changing the DocsEnum/TermsEnum API. Allow Scorer to expose positions and payloads aka. nuke spans -- Key: LUCENE-2878 URL: https://issues.apache.org/jira/browse/LUCENE-2878 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: Positions Branch Reporter: Simon Willnauer Assignee: Robert Muir Labels: gsoc2014 Fix For: Positions Branch Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec. So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute. I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :). The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't
[jira] [Commented] (SOLR-6640) ChaosMonkeySafeLeaderTest failure with CorruptIndexException
[ https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249771#comment-14249771 ] Shalin Shekhar Mangar commented on SOLR-6640: - So what is the right way to implement partial replication in this case? Force deleting the file (Varun's patch) probably won't work on windows and/or not play well with the open searchers. In SolrCloud we could just close the searcher before rollback because a replica in recovery won't get any search requests but that's not practical in standalone Solr because it'd cause downtime. ChaosMonkeySafeLeaderTest failure with CorruptIndexException Key: SOLR-6640 URL: https://issues.apache.org/jira/browse/SOLR-6640 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 5.0 Reporter: Shalin Shekhar Mangar Fix For: 5.0 Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, SOLR-6640.patch, SOLR-6640.patch Test failure found on jenkins: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/ {code} 1 tests failed. REGRESSION: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch Error Message: shard2 is not consistent. Got 62 from http://127.0.0.1:57436/collection1lastClient and got 24 from http://127.0.0.1:53065/collection1 Stack Trace: java.lang.AssertionError: shard2 is not consistent. Got 62 from http://127.0.0.1:57436/collection1lastClient and got 24 from http://127.0.0.1:53065/collection1 at __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234) at org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869) {code} Cause of inconsistency is: {code} Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv))) [junit4] 2 at org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259) [junit4] 2 at org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88) [junit4] 2 at org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64) [junit4] 2 at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:102) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249785#comment-14249785 ] Adrien Grand commented on LUCENE-5951: -- +1 Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6801) Load components from blob store
[ https://issues.apache.org/jira/browse/SOLR-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6801: - Description: The solrconfig APIs ( SOLR-6607) now allow registering components through API. SOLR-6787 will support for blob storage. Jars should be able to be loaded from blobs example {code} curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: application/json -d '{ create-requesthandler : {name : /mypath , class:org.apache.solr.handler.DumpRequestHandler, lib : mycomponent, version:2} }' {code} was: The solrconfig APIs ( SOLR-6607) now allow registering components through API. SOLR-6787 will support for blob storage. Jars should be able to be loaded from blobs example {code} curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: application/json -d '{ create-requesthandler : {name : /mypath , class:org.apache.solr.handler.DumpRequestHandler, startup:lazy, lib : .system:mycomponent, version:2} }' {code} Load components from blob store --- Key: SOLR-6801 URL: https://issues.apache.org/jira/browse/SOLR-6801 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul The solrconfig APIs ( SOLR-6607) now allow registering components through API. SOLR-6787 will support for blob storage. Jars should be able to be loaded from blobs example {code} curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: application/json -d '{ create-requesthandler : {name : /mypath , class:org.apache.solr.handler.DumpRequestHandler, lib : mycomponent, version:2} }' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6801) Load components from blob store
[ https://issues.apache.org/jira/browse/SOLR-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6801: - Attachment: SOLR-6801.patch Feature complete. No testcases yet. I will add the testcases and do some refactoring and commit this soon. comments/suggestions are welcome Load components from blob store --- Key: SOLR-6801 URL: https://issues.apache.org/jira/browse/SOLR-6801 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6801.patch The solrconfig APIs ( SOLR-6607) now allow registering components through API. SOLR-6787 will support for blob storage. Jars should be able to be loaded from blobs example {code} curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: application/json -d '{ create-requesthandler : {name : /mypath , class:org.apache.solr.handler.DumpRequestHandler, lib : mycomponent, version:2} }' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249820#comment-14249820 ] Shalin Shekhar Mangar commented on LUCENE-5951: --- +1 Very nice! Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery
[ https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249826#comment-14249826 ] Forest Soup commented on SOLR-6683: --- I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as expected. When I set below config, it still go into SnapPuller code even if I only newly added 800 doc. updateLog str name=dir${solr.ulog.dir:}/str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After my reading code, it seems that lines in org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the issue: if (ourHighThreshold otherLow) { // Small overlap between version windows and ours is older // This means that we might miss updates if we attempted to use this method. // Since there exists just one replica that is so much newer, we must // fail the sync. log.info(msg() + Our versions are too old. ourHighThreshold=+ourHighThreshold + otherLowThreshold=+otherLow); return false; } Could you please comment? Thanks! Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery --- Key: SOLR-6683 URL: https://issues.apache.org/jira/browse/SOLR-6683 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.7 Environment: Redhat Linux 64bit Reporter: Forest Soup Priority: Critical Labels: performance If there are 100 docs gap between the recovering node and the good node, the solr will do snap pull recovery instead of peersync. Can the 100 docs be configurable? For example, there can be 1, 1000, or 10 docs gap between the good node and the node to recover. For 100 doc, a regular restart of a solr node will trigger a full recovery, which is a huge impact to the performance of the running systems Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog
[ https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249827#comment-14249827 ] Forest Soup commented on SOLR-6359: --- I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as expected. When I set below config, it still go into SnapPuller code even if I only newly added 800 doc. updateLog str name=dir$ {solr.ulog.dir:} /str int name=numRecordsToKeep1/int int name=maxNumLogsToKeep100/int /updateLog After my reading code, it seems that lines in org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the issue: if (ourHighThreshold otherLow) { // Small overlap between version windows and ours is older // This means that we might miss updates if we attempted to use this method. // Since there exists just one replica that is so much newer, we must // fail the sync. log.info(msg() + Our versions are too old. ourHighThreshold=+ourHighThreshold + otherLowThreshold=+otherLow); return false; } Could you please comment? Thanks! Allow customization of the number of records and logs kept by UpdateLog --- Key: SOLR-6359 URL: https://issues.apache.org/jira/browse/SOLR-6359 Project: Solr Issue Type: Improvement Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the records) in an heavily indexing setup, leading to full recovery even if Solr was just stopped and restarted. These values should be customizable (even if only present as expert options). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6119) Add IndexWriter.getTotalNewBytesWritten
[ https://issues.apache.org/jira/browse/LUCENE-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249837#comment-14249837 ] Michael McCandless commented on LUCENE-6119: Thinking about this more ... it may be better to do this entirely inside a FilterDirectory. E.g. when IndexOutput is closed, and the IOContext is not MERGE, increment the bytes written ... and then that same directory instance could dynamically update the target merge throttling ... maybe. Add IndexWriter.getTotalNewBytesWritten --- Key: LUCENE-6119 URL: https://issues.apache.org/jira/browse/LUCENE-6119 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-6119.patch This method returns number of incoming bytes IW has written since it was opened, excluding merging. It tracks flushed segments, new commits (segments_N), incoming files/segments by addIndexes, newly written live docs / doc values updates files. It's an easy statistic for IW to track and should be useful to help applications more intelligently set defaults for IO throttling (RateLimiter). For example, an application that does hardly any indexing but finally triggered a large merge can afford to heavily throttle that large merge so it won't interfere with ongoing searches. But an application that's causing IW to write new bytes at 50 MB/sec must set a correspondingly higher IO throttling otherwise merges will clearly fall behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6850) AutoAddReplicas does not wait enough for a replica to get live
[ https://issues.apache.org/jira/browse/SOLR-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249843#comment-14249843 ] Varun Thacker commented on SOLR-6850: - [~markrmil...@gmail.com] What are your thoughts on this? AutoAddReplicas does not wait enough for a replica to get live -- Key: SOLR-6850 URL: https://issues.apache.org/jira/browse/SOLR-6850 Project: Solr Issue Type: Bug Affects Versions: 4.10, 4.10.1, 4.10.2, 5.0, Trunk Reporter: Varun Thacker Attachments: SOLR-6850.patch, SOLR-6850.patch After we have detected that a replica needs failing over, we add a replica and wait to see if it's live. Currently we only wait for 30ms , but I think the intention here was to wait for 30s. In CloudStateUtil.waitToSeeLive() the conversion should have been {{System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutInMs, TimeUnit.SECONDS);}} instead of {{System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutInMs, TimeUnit.MILLISECONDS);}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box
[ https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249855#comment-14249855 ] ASF subversion and git services commented on LUCENE-6117: - Commit 1646240 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1646240 ] LUCENE-6117: make infostream usable again infostream is currently unusable out of box --- Key: LUCENE-6117 URL: https://issues.apache.org/jira/browse/LUCENE-6117 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-6117.patch testpoints used to only be emitted by assertions (still sketchy), but now are emitted always. I assume this is due to the change to support running tests with assertions disabled. we should try to clean this up, simple stuff like this is now useless: {code} indexWriterConfig.setInfoStream(System.out); // causes massive flooding like this: // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start {code} I hit this several times today just trying to do benchmarks and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box
[ https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249858#comment-14249858 ] ASF subversion and git services commented on LUCENE-6117: - Commit 1646242 from [~rcmuir] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1646242 ] LUCENE-6117: make infostream usable again infostream is currently unusable out of box --- Key: LUCENE-6117 URL: https://issues.apache.org/jira/browse/LUCENE-6117 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-6117.patch testpoints used to only be emitted by assertions (still sketchy), but now are emitted always. I assume this is due to the change to support running tests with assertions disabled. we should try to clean this up, simple stuff like this is now useless: {code} indexWriterConfig.setInfoStream(System.out); // causes massive flooding like this: // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start {code} I hit this several times today just trying to do benchmarks and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6117) infostream is currently unusable out of box
[ https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-6117. - Resolution: Fixed Fix Version/s: Trunk 5.0 infostream is currently unusable out of box --- Key: LUCENE-6117 URL: https://issues.apache.org/jira/browse/LUCENE-6117 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, Trunk Attachments: LUCENE-6117.patch testpoints used to only be emitted by assertions (still sketchy), but now are emitted always. I assume this is due to the change to support running tests with assertions disabled. we should try to clean this up, simple stuff like this is now useless: {code} indexWriterConfig.setInfoStream(System.out); // causes massive flooding like this: // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start {code} I hit this several times today just trying to do benchmarks and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6114) Remove bw compat cruft from packedints
[ https://issues.apache.org/jira/browse/LUCENE-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249866#comment-14249866 ] ASF subversion and git services commented on LUCENE-6114: - Commit 1646247 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1646247 ] LUCENE-6114: remove bw compat cruft from packedints Remove bw compat cruft from packedints -- Key: LUCENE-6114 URL: https://issues.apache.org/jira/browse/LUCENE-6114 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Fix For: Trunk Attachments: LUCENE-6114.patch In trunk we have some old logic that is not needed (versions 0 and 1). So we can remove support for structures that aren't byte-aligned, zigzag-encoded monotonics, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6114) Remove bw compat cruft from packedints
[ https://issues.apache.org/jira/browse/LUCENE-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-6114. - Resolution: Fixed Remove bw compat cruft from packedints -- Key: LUCENE-6114 URL: https://issues.apache.org/jira/browse/LUCENE-6114 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Fix For: Trunk Attachments: LUCENE-6114.patch In trunk we have some old logic that is not needed (versions 0 and 1). So we can remove support for structures that aren't byte-aligned, zigzag-encoded monotonics, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6854) Stale cached state in CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-6854: Assignee: Noble Paul Stale cached state in CloudSolrServer - Key: SOLR-6854 URL: https://issues.apache.org/jira/browse/SOLR-6854 Project: Solr Issue Type: Bug Components: SolrCloud, SolrJ Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: cache, solrcloud, solrj CloudSolrServer’s cached state is not being updated for a newly created collection if we started polling for the collection state too early and a down state is cached. Requests to the newly created collection continues to fail with No live SolrServers available to handle this request until the cache is invalidated by time. Logging on the client side reveals that while the state in ZkStateReader is updated to active, the cached state in CloudSolrServer remains in down. {quote} CloudSolrServer cached state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:down, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, maxShardsPerNode:1, external:true, router:{ name:compositeId}, replicationFactor:1”} ZkStateReader state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:active, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, leader:true, maxShardsPerNode:1, router:{ name:compositeId}, external:true, replicationFactor:1”} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6127) Improve Solr's exampledocs data
[ https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249871#comment-14249871 ] Varun Thacker commented on SOLR-6127: - I think we could do the following - 1. Take the film.json|xml|csv files and replace it with all the data in the exampledocs folder 2. Put the python script in the dev-tools folder so that in the future if we want to update the data we can use it. 3. Drop in the LICENSE.txt file in the exampledocs folder? On the website I can see this place which would need to be updated - Indexing Solr XML , Indexing JSON, Indexing CSV (Comma/Column Separated Values) - http://lucene.apache.org/solr/quickstart.html Maybe also updated the Searching section on the quickstart page also? We could use the material attached on the README.txt uploaded here. Oh, we will have to update the schema in sample_techproducts_configs configset and the browse handler in solrconfig with the new data too Improve Solr's exampledocs data --- Key: SOLR-6127 URL: https://issues.apache.org/jira/browse/SOLR-6127 Project: Solr Issue Type: Improvement Components: documentation, scripts and tools Reporter: Varun Thacker Assignee: Erik Hatcher Fix For: 5.0, Trunk Attachments: LICENSE.txt, README.txt, README.txt, film.csv, film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py Currently - The CSV example has 10 documents. - The JSON example has 4 documents. - The XML example has 32 documents. 1. We should have equal number of documents and the same documents in all the example formats 2. A data set which is slightly more comprehensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6554) Speed up overseer operations for collections with stateFormat 1
[ https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6554: Attachment: SOLR-6554-workqueue-fixes.patch # Fixes the logic to add/remove items from workQueue so that the invariant is maintained # Adds a ZkWriteListener interface which is used by Overseer to add/remove items from the workQueue depending on how/when state is flushed to ZK # The earlier patches enabled batching on work queue processing but that is wrong because we do not have any fallback if a batch fails. So batching is disabled whenever we operate on items from the work queue. Speed up overseer operations for collections with stateFormat 1 - Key: SOLR-6554 URL: https://issues.apache.org/jira/browse/SOLR-6554 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 5.0, Trunk Reporter: Shalin Shekhar Mangar Attachments: SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554-workqueue-fixes.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch Right now (after SOLR-5473 was committed), a node watches a collection only if stateFormat=1 or if that node hosts at least one core belonging to that collection. This means that a node which is the overseer operates on all collections but watches only a few. So any read goes directly to zookeeper which slows down overseer operations. Let's have the overseer node watch all collections always and never remove those watches (except when the collection itself is deleted). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249924#comment-14249924 ] ASF subversion and git services commented on LUCENE-2878: - Commit 1646271 from [~romseygeek] in branch 'dev/branches/lucene2878' [ https://svn.apache.org/r1646271 ] LUCENE-2878: Remove dead code from TermScorer Allow Scorer to expose positions and payloads aka. nuke spans -- Key: LUCENE-2878 URL: https://issues.apache.org/jira/browse/LUCENE-2878 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: Positions Branch Reporter: Simon Willnauer Assignee: Robert Muir Labels: gsoc2014 Fix For: Positions Branch Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec. So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute. I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :). The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249927#comment-14249927 ] ASF subversion and git services commented on SOLR-6857: --- Commit 1646272 from [~sar...@syr.edu] in branch 'dev/trunk' [ https://svn.apache.org/r1646272 ] SOLR-6857: Idea modules missing dependencies Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Priority: Trivial Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249938#comment-14249938 ] ASF subversion and git services commented on SOLR-6857: --- Commit 1646275 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1646275 ] SOLR-6857: Idea modules missing dependencies (merged trunk r1646272) Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Priority: Trivial Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6797) Add score=degrees|kilometers|miles for AbstractSpatialFieldType
[ https://issues.apache.org/jira/browse/SOLR-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya updated SOLR-6797: --- Attachment: SOLR-6797.patch That makes sense; more intuitive than a separate units/distanceUnits parameter. Attached a patch that supports score=distance (back compat, km when geo) | kilometers | miles | degrees | area/area2D (km^2 when geo, deg^2 in 2D). Tested manually and seems to work. Add score=degrees|kilometers|miles for AbstractSpatialFieldType --- Key: SOLR-6797 URL: https://issues.apache.org/jira/browse/SOLR-6797 Project: Solr Issue Type: Improvement Components: spatial Reporter: David Smiley Attachments: SOLR-6797.patch Annoyingly, the units=degrees attribute is required for fields extending AbstractSpatialFieldType (e.g. RPT, BBox). And it doesn't really have any effect. I propose the following: * Simply drop the attribute; ignore it if someone sets it to degrees (for back-compat). * When using score=distance, or score=area or area2D (as seen in BBoxField) then use kilometers if geo=true, otherwise degrees. * Add support for score=degrees|kilometers|miles|degrees -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Strassburg resolved SOLR-6857. Resolution: Fixed Fix Version/s: Trunk 5.0 Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Priority: Trivial Fix For: 5.0, Trunk Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6120) how should MockIndexOutputWrapper.close handle exceptions in delegate.close
Michael McCandless created LUCENE-6120: -- Summary: how should MockIndexOutputWrapper.close handle exceptions in delegate.close Key: LUCENE-6120 URL: https://issues.apache.org/jira/browse/LUCENE-6120 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Priority: Minor Chasing a tricking Elasticsearch test failure, it came down to the delegate.close throwing an exception (ClosedByInterruptException, disturbingly, in this case), causing MockIndexOutputWrapper.close to fail to remove that IO from MDW's map. The question is, what should we do here, when delegate.close throws an exception? Is the delegate in fact closed, even when it throws an exception? Java8's docs on java.io.Closeable say this: As noted in AutoCloseable.close(), cases where the close may fail require careful attention. It is strongly advised to relinquish the underlying resources and to internally mark the Closeable as closed, prior to throwing the IOException. And our OutputStreamIndexOutput is careful about this (flushes, then closes in a try-with-resources). So, I think MDW should be fixed to mark the IO as closed even if delegate.close throws an exception... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe reassigned SOLR-6857: Assignee: Steve Rowe Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Assignee: Steve Rowe Priority: Trivial Fix For: 5.0, Trunk Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249960#comment-14249960 ] Steve Rowe commented on SOLR-6857: -- Thanks [~jstrassburg], I committed your patch to trunk and branch_5x. On branch_5x under Java7, the patch isn't required, but it is for some reason under Java8, so I committed there too. I was going to resolve the issue but you had already done so :) - the convention is that the person who commits the fix resolves the issue. Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Priority: Trivial Fix For: 5.0, Trunk Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5951: Attachment: LUCENE-5951.patch Try to improve the SSD detector more to make it safe to use for this purpose. It was mostly a joke and really ... not good code. :) * fix contract to throw IOException when incoming path does not exist. This is important not to mask. * for our internal heuristics, we could easily trigger SecurityException / AIOOBE, we are doing things that are not guaranteed at all. So those are important to mask. * don't use Files.readAllBytes, that method is too dangerous in these heuristics. Just read one byte. We should improve the getDeviceName too, but its less critical. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249966#comment-14249966 ] James Strassburg commented on SOLR-6857: OK, this was my first submission. I saw the commit and verified it before closing but I won't close them in the future. thanks. Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Assignee: Steve Rowe Priority: Trivial Fix For: 5.0, Trunk Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6858) Leader sync's PeerSync use cannot consider SocketException or NoHttpResponseException success.
Mark Miller created SOLR-6858: - Summary: Leader sync's PeerSync use cannot consider SocketException or NoHttpResponseException success. Key: SOLR-6858 URL: https://issues.apache.org/jira/browse/SOLR-6858 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6857) Idea modules missing dependencies
[ https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249973#comment-14249973 ] Steve Rowe commented on SOLR-6857: -- bq. this was my first submission Cool, thanks again, (more of your) patches welcome! Idea modules missing dependencies - Key: SOLR-6857 URL: https://issues.apache.org/jira/browse/SOLR-6857 Project: Solr Issue Type: Bug Components: Build Affects Versions: Trunk Environment: IntelliJ IDEA Reporter: James Strassburg Assignee: Steve Rowe Priority: Trivial Fix For: 5.0, Trunk Attachments: SOLR-6857.patch The IDEA dev-tools configuration doesn't build in IDEA after running ant idea because the following modules are missing a dependency to analysis-common module: * velocity * extraction * map-reduce * dataimporthandler-extras To reproduce, run ant clean-idea followed by ant idea. Open the project in IDEA, configure the JDK, and make the project. The modules listed above will fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. Adding analysis-common as a module dependency fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6859) Disable REBALANCELEADERS for 5.0
Erick Erickson created SOLR-6859: Summary: Disable REBALANCELEADERS for 5.0 Key: SOLR-6859 URL: https://issues.apache.org/jira/browse/SOLR-6859 Project: Solr Issue Type: Bug Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker This is flat dangerous with it's current implementation and should not get into the wild. The (I hope) proper fix is in SOLR-6691. I want to let that code bake for a while post 5.0 before committing though. So this will just comment the handling of REBALANCELEADERS from the collections API for the time being. Marked as blocker, but I should be able to take care of this ASAP so it shouldn't stand in the way of 5.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6860) Re-enable REBALANCELEADERS for 5.1
Erick Erickson created SOLR-6860: Summary: Re-enable REBALANCELEADERS for 5.1 Key: SOLR-6860 URL: https://issues.apache.org/jira/browse/SOLR-6860 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson The rebalanceleaders command is disabled for 5.0 to allow more baking time. This ticket is to re-enable it (just uncomment it in collections api handling) and merge SOLR-6691 into 5.1 after 5.0 has been cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data
[ https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6127: Attachment: SOLR-6127.patch Patch does a few things 1. Removed all current exampledocs file 2. added film.xml film.json film.csv and the license file 3. added the exampledocs_generator.py to dev-tools folder 4. modified the schema.xml appropriately Now we need to decide whether to rename the techproducts configset to film? Improve Solr's exampledocs data --- Key: SOLR-6127 URL: https://issues.apache.org/jira/browse/SOLR-6127 Project: Solr Issue Type: Improvement Components: documentation, scripts and tools Reporter: Varun Thacker Assignee: Erik Hatcher Fix For: 5.0, Trunk Attachments: LICENSE.txt, README.txt, README.txt, SOLR-6127.patch, film.csv, film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py Currently - The CSV example has 10 documents. - The JSON example has 4 documents. - The XML example has 32 documents. 1. We should have equal number of documents and the same documents in all the example formats 2. A data set which is slightly more comprehensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5951: --- Attachment: LUCENE-5951.patch New patch, renaming to spins, and also unwrapping FileSwitchDir, and returning false for RAMDirectory. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6787) API to manage blobs in Solr
[ https://issues.apache.org/jira/browse/SOLR-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6787: - Description: A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create your .system collection first http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2 The config for this collection is automatically created . numShards for this collection is hardcoded to 1 #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream mycomponent.jar {code} Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use the standard delete commands to delete the old entries was: A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream mycomponent.jar {code} Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use the standard delete commands to delete the old entries API to manage blobs in Solr Key: SOLR-6787 URL: https://issues.apache.org/jira/browse/SOLR-6787 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6787.patch, SOLR-6787.patch A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create your .system collection first http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2 The config for this collection is automatically created . numShards for this collection is hardcoded to 1 #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream mycomponent.jar {code} Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use
[jira] [Updated] (SOLR-6787) API to manage blobs in Solr
[ https://issues.apache.org/jira/browse/SOLR-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-6787: - Description: A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create your .system collection first http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2 #The config for this collection is automatically created . numShards for this collection is hardcoded to 1 #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream mycomponent.jar {code} Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use the standard delete commands to delete the old entries was: A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create your .system collection first http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2 The config for this collection is automatically created . numShards for this collection is hardcoded to 1 #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream mycomponent.jar {code} Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use the standard delete commands to delete the old entries API to manage blobs in Solr Key: SOLR-6787 URL: https://issues.apache.org/jira/browse/SOLR-6787 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6787.patch, SOLR-6787.patch A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create your .system collection first http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2 #The config for this collection is automatically created . numShards for this collection is hardcoded to 1 #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl
[jira] [Commented] (SOLR-6850) AutoAddReplicas does not wait enough for a replica to get live
[ https://issues.apache.org/jira/browse/SOLR-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250057#comment-14250057 ] Mark Miller commented on SOLR-6850: --- Good catch Varun! I just took a look and this is actually fixed in Cloudera Search - whoops. I'll sync up and see if there is any other changes I have that are missing after committing this. AutoAddReplicas does not wait enough for a replica to get live -- Key: SOLR-6850 URL: https://issues.apache.org/jira/browse/SOLR-6850 Project: Solr Issue Type: Bug Affects Versions: 4.10, 4.10.1, 4.10.2, 5.0, Trunk Reporter: Varun Thacker Attachments: SOLR-6850.patch, SOLR-6850.patch After we have detected that a replica needs failing over, we add a replica and wait to see if it's live. Currently we only wait for 30ms , but I think the intention here was to wait for 30s. In CloudStateUtil.waitToSeeLive() the conversion should have been {{System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutInMs, TimeUnit.SECONDS);}} instead of {{System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutInMs, TimeUnit.MILLISECONDS);}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250059#comment-14250059 ] Robert Muir commented on LUCENE-5951: - Thanks, i will take another crack at FSDir logic. we should be able to handle tmpfs etc better here (likely on mac, too). Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6559) Create an endpoint /update/xml/docs endpoint to do custom xml indexing
[ https://issues.apache.org/jira/browse/SOLR-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Sharma updated SOLR-6559: Attachment: SOLR-6559.patch Attaching patch that can be applied on latest trunk. The XPathRecordReader doesn't support wild card. Either we have to implement the wildcard functionality or use another XPath parser. Also added a unit test (testSupportedWildCard) demonstrating the capability is unsupported. Also the patch has positive unit tests which are working. Create an endpoint /update/xml/docs endpoint to do custom xml indexing -- Key: SOLR-6559 URL: https://issues.apache.org/jira/browse/SOLR-6559 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch Just the way we have an json end point create an xml end point too. use the XPathRecordReader in DIH to do the same . The syntax would require slight tweaking to match the params of /update/json/docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box
[ https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250067#comment-14250067 ] ASF subversion and git services commented on LUCENE-6117: - Commit 1646288 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1646288 ] LUCENE-6117: this test secretly relies on testPoint too infostream is currently unusable out of box --- Key: LUCENE-6117 URL: https://issues.apache.org/jira/browse/LUCENE-6117 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, Trunk Attachments: LUCENE-6117.patch testpoints used to only be emitted by assertions (still sketchy), but now are emitted always. I assume this is due to the change to support running tests with assertions disabled. we should try to clean this up, simple stuff like this is now useless: {code} indexWriterConfig.setInfoStream(System.out); // causes massive flooding like this: // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start {code} I hit this several times today just trying to do benchmarks and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box
[ https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250069#comment-14250069 ] ASF subversion and git services commented on LUCENE-6117: - Commit 1646289 from [~mikemccand] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1646289 ] LUCENE-6117: this test secretly relies on testPoint too infostream is currently unusable out of box --- Key: LUCENE-6117 URL: https://issues.apache.org/jira/browse/LUCENE-6117 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 5.0, Trunk Attachments: LUCENE-6117.patch testpoints used to only be emitted by assertions (still sketchy), but now are emitted always. I assume this is due to the change to support running tests with assertions disabled. we should try to clean this up, simple stuff like this is now useless: {code} indexWriterConfig.setInfoStream(System.out); // causes massive flooding like this: // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread addDocument start {code} I hit this several times today just trying to do benchmarks and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6554) Speed up overseer operations for collections with stateFormat 1
[ https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-6554: Attachment: SOLR-6554-workqueue-fixes.patch Slightly refactored. All tests pass. Speed up overseer operations for collections with stateFormat 1 - Key: SOLR-6554 URL: https://issues.apache.org/jira/browse/SOLR-6554 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 5.0, Trunk Reporter: Shalin Shekhar Mangar Attachments: SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554-workqueue-fixes.patch, SOLR-6554-workqueue-fixes.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch Right now (after SOLR-5473 was committed), a node watches a collection only if stateFormat=1 or if that node hosts at least one core belonging to that collection. This means that a node which is the overseer operates on all collections but watches only a few. So any read goes directly to zookeeper which slows down overseer operations. Let's have the overseer node watch all collections always and never remove those watches (except when the collection itself is deleted). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6120) how should MockIndexOutputWrapper.close handle exceptions in delegate.close
[ https://issues.apache.org/jira/browse/LUCENE-6120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250079#comment-14250079 ] Robert Muir commented on LUCENE-6120: - this is a test bug, but there are more bugs here in addition to this one. if close() is called multiple times, then disk usage computation and internal ref counting (trace through removeIndexOutput()) is wrong. that violates Closeable.close() how should MockIndexOutputWrapper.close handle exceptions in delegate.close --- Key: LUCENE-6120 URL: https://issues.apache.org/jira/browse/LUCENE-6120 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Priority: Minor Chasing a tricking Elasticsearch test failure, it came down to the delegate.close throwing an exception (ClosedByInterruptException, disturbingly, in this case), causing MockIndexOutputWrapper.close to fail to remove that IO from MDW's map. The question is, what should we do here, when delegate.close throws an exception? Is the delegate in fact closed, even when it throws an exception? Java8's docs on java.io.Closeable say this: As noted in AutoCloseable.close(), cases where the close may fail require careful attention. It is strongly advised to relinquish the underlying resources and to internally mark the Closeable as closed, prior to throwing the IOException. And our OutputStreamIndexOutput is careful about this (flushes, then closes in a try-with-resources). So, I think MDW should be fixed to mark the IO as closed even if delegate.close throws an exception... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-5.x #790: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-5.x/790/ 3 tests failed. FAILED: org.apache.solr.hadoop.MorphlineMapperTest.org.apache.solr.hadoop.MorphlineMapperTest Error Message: null Stack Trace: java.lang.AssertionError: null at __randomizedtesting.SeedInfo.seed([682F0CE7F59E9F9B]:0) at org.apache.lucene.util.TestRuleTemporaryFilesCleanup.before(TestRuleTemporaryFilesCleanup.java:105) at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.before(TestRuleAdapter.java:26) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:35) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) FAILED: org.apache.solr.hadoop.MorphlineBasicMiniMRTest.testPathParts Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([197EC495E889594E]:0) FAILED: org.apache.solr.hadoop.MorphlineBasicMiniMRTest.org.apache.solr.hadoop.MorphlineBasicMiniMRTest Error Message: Suite timeout exceeded (= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (= 720 msec). at __randomizedtesting.SeedInfo.seed([197EC495E889594E]:0) Build Log: [...truncated 53791 lines...] BUILD FAILED /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:552: The following error occurred while executing this line: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:204: The following error occurred while executing this line: : Java returned: 1 Total time: 382 minutes 43 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone
Anshum Gupta created SOLR-6861: -- Summary: Remove example/exampledocs/post.sh as the concept of default update URL is almost gone Key: SOLR-6861 URL: https://issues.apache.org/jira/browse/SOLR-6861 Project: Solr Issue Type: Task Reporter: Anshum Gupta Assignee: Anshum Gupta We should remove post.sh and replace it with bin/post (SOLR-6435). post.sh right now has a hardcoded single core update URL i.e. http://localhost:8983/solr/update -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6559) Create an endpoint /update/xml/docs endpoint to do custom xml indexing
[ https://issues.apache.org/jira/browse/SOLR-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250147#comment-14250147 ] Noble Paul commented on SOLR-6559: -- bq.The XPathRecordReader doesn't support wild card it does . Look at the tests Create an endpoint /update/xml/docs endpoint to do custom xml indexing -- Key: SOLR-6559 URL: https://issues.apache.org/jira/browse/SOLR-6559 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch Just the way we have an json end point create an xml end point too. use the XPathRecordReader in DIH to do the same . The syntax would require slight tweaking to match the params of /update/json/docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4492) Please add support for Collection API CREATE method to evenly distribute leader roles among instances
[ https://issues.apache.org/jira/browse/SOLR-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250149#comment-14250149 ] Tim Vaillancourt commented on SOLR-4492: Thanks Erick! Please add support for Collection API CREATE method to evenly distribute leader roles among instances - Key: SOLR-4492 URL: https://issues.apache.org/jira/browse/SOLR-4492 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Tim Vaillancourt Assignee: Erick Erickson Priority: Minor Fix For: 5.0, Trunk Currently in SolrCloud 4.1, a CREATE call to the Collection API will cause the server receiving the CREATE call to become the leader of all shards. I would like to ask for the ability for the CREATE call to evenly distribute the leader role across all instances, ie: if I create 3 shards over 3 SOLR 4.1 instances, each instance/node would only be the leader of 1 shard. This would be logically consistent with the way replicas are randomly distributed by this same call across instances/nodes. Currently, this CREATE call will cause the server receiving the call to become the leader of 3 shards. curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2' PS: Thank you SOLR developers for your contributions! Tim Vaillancourt -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone
[ https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250170#comment-14250170 ] ASF subversion and git services commented on SOLR-6861: --- Commit 1646297 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1646297 ] SOLR-6861: Remove post.sh from exampledocs Remove example/exampledocs/post.sh as the concept of default update URL is almost gone -- Key: SOLR-6861 URL: https://issues.apache.org/jira/browse/SOLR-6861 Project: Solr Issue Type: Task Reporter: Anshum Gupta Assignee: Anshum Gupta We should remove post.sh and replace it with bin/post (SOLR-6435). post.sh right now has a hardcoded single core update URL i.e. http://localhost:8983/solr/update -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone
[ https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta resolved SOLR-6861. Resolution: Fixed Remove example/exampledocs/post.sh as the concept of default update URL is almost gone -- Key: SOLR-6861 URL: https://issues.apache.org/jira/browse/SOLR-6861 Project: Solr Issue Type: Task Reporter: Anshum Gupta Assignee: Anshum Gupta We should remove post.sh and replace it with bin/post (SOLR-6435). post.sh right now has a hardcoded single core update URL i.e. http://localhost:8983/solr/update -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone
[ https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250174#comment-14250174 ] ASF subversion and git services commented on SOLR-6861: --- Commit 1646298 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1646298 ] SOLR-6861: Remove post.sh from exampledocs (Merge from trunk) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone -- Key: SOLR-6861 URL: https://issues.apache.org/jira/browse/SOLR-6861 Project: Solr Issue Type: Task Reporter: Anshum Gupta Assignee: Anshum Gupta We should remove post.sh and replace it with bin/post (SOLR-6435). post.sh right now has a hardcoded single core update URL i.e. http://localhost:8983/solr/update -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6852) SimplePostTool should no longer default to collection1
[ https://issues.apache.org/jira/browse/SOLR-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta resolved SOLR-6852. Resolution: Fixed SimplePostTool should no longer default to collection1 -- Key: SOLR-6852 URL: https://issues.apache.org/jira/browse/SOLR-6852 Project: Solr Issue Type: Improvement Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6852.patch, SOLR-6852.patch Solr no longer would be bootstrapped with collection1 and so it no longer makes sense for the SimplePostTool to default to collection1 either. Without an explicit collection/core/url value, the call should just fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone
[ https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-6861: --- Fix Version/s: Trunk 5.0 Remove example/exampledocs/post.sh as the concept of default update URL is almost gone -- Key: SOLR-6861 URL: https://issues.apache.org/jira/browse/SOLR-6861 Project: Solr Issue Type: Task Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0, Trunk We should remove post.sh and replace it with bin/post (SOLR-6435). post.sh right now has a hardcoded single core update URL i.e. http://localhost:8983/solr/update -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250219#comment-14250219 ] Hoss Man commented on LUCENE-5951: -- {noformat} + public static int AUTO_DETECT_MERGES_AND_THREADS = -1; {noformat} ...that's suppose to be a final (sentinel value) correct? nothing should be allowed modify it at run time? {noformat} + public synchronized void setMaxMergesAndThreads(int maxMergeCount, int maxThreadCount) { +if (maxMergeCount == AUTO_DETECT_MERGES_AND_THREADS maxThreadCount == AUTO_DETECT_MERGES_AND_THREADS) { + // OK + maxMergeCount = AUTO_DETECT_MERGES_AND_THREADS; + maxThreadCount = AUTO_DETECT_MERGES_AND_THREADS; {noformat} ...is that suppose to be setting this.maxMergeCount and this.maxThreadCount ? ... it looks like it it's just a No-Op (and this.maxMergeCount and this.maxThreadCount never get set in this case?) {noformat} + public static boolean spins(Path path) throws IOException { {noformat} ...is it worth using a terinary enum (or nullable Boolean) here to track the diff between: * confident it's a spinning disk * confident it's not a spinning disk * unknown what type of storage this is ...that way we can make the default behavior of CMS conservative, and only be aggressive if we are confident it's not-spinning; but app devs can be more aggressive -- call the same spins() utility and only use conservative values if they are confident it's a spinning disk, otherwise call setMaxMergesAndThreads with higher values. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-6019) IndexWriter allows to add same field with different docvlaues type
[ https://issues.apache.org/jira/browse/LUCENE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-6019: Assignee: Michael McCandless This commit caused LUCENE-6117, which Rob found fixed (thanks!), but is too big to backport to 4.10.x. I think to fix it, I should revert the -Dtests.asserts part of this change (but keep the original bug fix). IndexWriter allows to add same field with different docvlaues type --- Key: LUCENE-6019 URL: https://issues.apache.org/jira/browse/LUCENE-6019 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.10.1 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Critical Fix For: 4.10.2, 5.0, Trunk Attachments: LUCENE-6019.patch, LUCENE-6019.patch IndexWriter checks if the DV types are consitent in multiple places but if due to some problems in Elasticsearch users where able to add the same field with different DV types causing merges to fail. Yet I was able to reduce this to a lucene testcase but I was puzzled since it always failed. Yet, I had to run it without assertions and that cause the bug to happen. I can add field foo with BINARY and SORTED_SET causing a merge to fail. Here is a gist https://gist.github.com/s1monw/8707f924b76ba40ee5f3 / https://github.com/elasticsearch/elasticsearch/issues/8009 While this is certainly a problem in Elasticsearch Lucene also allows to corrupt an index due to user error which I think should be prevented. NOTE: this only fails if you run without assertions which I think lucene should do in CI once in a while too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6019) IndexWriter allows to add same field with different docvlaues type
[ https://issues.apache.org/jira/browse/LUCENE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250237#comment-14250237 ] Robert Muir commented on LUCENE-6019: - +1, i would rather not cause instability or false failures in the bugfix branch. IndexWriter allows to add same field with different docvlaues type --- Key: LUCENE-6019 URL: https://issues.apache.org/jira/browse/LUCENE-6019 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.10.1 Reporter: Simon Willnauer Assignee: Michael McCandless Priority: Critical Fix For: 4.10.2, 5.0, Trunk Attachments: LUCENE-6019.patch, LUCENE-6019.patch IndexWriter checks if the DV types are consitent in multiple places but if due to some problems in Elasticsearch users where able to add the same field with different DV types causing merges to fail. Yet I was able to reduce this to a lucene testcase but I was puzzled since it always failed. Yet, I had to run it without assertions and that cause the bug to happen. I can add field foo with BINARY and SORTED_SET causing a merge to fail. Here is a gist https://gist.github.com/s1monw/8707f924b76ba40ee5f3 / https://github.com/elasticsearch/elasticsearch/issues/8009 While this is certainly a problem in Elasticsearch Lucene also allows to corrupt an index due to user error which I think should be prevented. NOTE: this only fails if you run without assertions which I think lucene should do in CI once in a while too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250241#comment-14250241 ] Robert Muir commented on LUCENE-5951: - I dont think we should make things complicated for app developers. We are not writing a generic spins() method for developers, its a lucene.internal method for good defaults. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250243#comment-14250243 ] Michael McCandless commented on LUCENE-5951: bq. ...that's suppose to be a final (sentinel value) correct? nothing should be allowed modify it at run time? Whoa, nice catch! I'll fix. bq. . it looks like it it's just a No-Op ( Gak, good catch :) I'll add a test that exposes this then fix it. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6851) oom_solr.sh problems
[ https://issues.apache.org/jira/browse/SOLR-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter reassigned SOLR-6851: Assignee: Timothy Potter oom_solr.sh problems Key: SOLR-6851 URL: https://issues.apache.org/jira/browse/SOLR-6851 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Timothy Potter Fix For: 5.0 noticed 2 problems with the oom_solr.sh script... 1) the script is only being run with the port of hte solr instance to terminate, so the log messages aren't getting writen to the correct directory -- if we change hte script to take a log dir/file as an argument, we can ensure the logs are written to the correct place 2) on my ubuntu linux machine (where /bin/sh is a symlink to /bin/dash), the console log is recording a script error when java runs oom_solr.sh... {noformat} # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=/home/hossman/lucene/5x_dev/solr/bin/oom_solr.sh 8983 # Executing /bin/sh -c /home/hossman/lucene/5x_dev/solr/bin/oom_solr.sh 8983... /home/hossman/lucene/5x_dev/solr/bin/oom_solr.sh: 20: [: 14305: unexpected operator Running OOM killer script for process 14305 for Solr on port 8983 Killed process 14305 {noformat} steps to reproduce: {{bin/solr -e techproducts -m 10m}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6559) Create an endpoint /update/xml/docs endpoint to do custom xml indexing
[ https://issues.apache.org/jira/browse/SOLR-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250268#comment-14250268 ] Anurag Sharma commented on SOLR-6559: - Looked for wildcard '*' couldn't find any unit test in TestXPathRecordReader Create an endpoint /update/xml/docs endpoint to do custom xml indexing -- Key: SOLR-6559 URL: https://issues.apache.org/jira/browse/SOLR-6559 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch Just the way we have an json end point create an xml end point too. use the XPathRecordReader in DIH to do the same . The syntax would require slight tweaking to match the params of /update/json/docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5951: --- Attachment: LUCENE-5951.patch New patch fixing Hoss's issues (thanks!). Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
Jason Wang created SOLR-6862: Summary: full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5951: Attachment: LUCENE-5951.patch I cleaned up the code to remove the hashmap, not try to lookup 'rotational' for obviously bogus names (like nfs), return false for tmpfs, etc. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250327#comment-14250327 ] Hoss Man commented on LUCENE-5951: -- {noformat} +for (FileStore store : FileSystems.getDefault().getFileStores()) { + String desc = store.toString(); + int start = desc.lastIndexOf('('); + int end = desc.indexOf(')', start); + mountToDevice.put(desc.substring(0, start-1), desc.substring(start+1, end)); +} {noformat} ...I don't see anything in the javadocs for FileStore making any guarantees about the toString -- so the results of these lastIndexOf and indexOf calls should probably have bounds checks to prevent IOOBE from substring. (either that or just catch the IOOBE and give up) {noformat} +if (!devName.isEmpty() Character.isDigit(devName.charAt(devName.length()-1))) { + devName = devName.substring(0, devName.length()-1); {noformat} ...what about people with lots of partitions? ie: /dev/sda42 Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250332#comment-14250332 ] Robert Muir commented on LUCENE-5951: - {quote} ...I don't see anything in the javadocs for FileStore making any guarantees about the toString – so the results of these lastIndexOf and indexOf calls should probably have bounds checks to prevent IOOBE from substring. (either that or just catch the IOOBE and give up) {quote} Maybe you missed the try-catch when looking at the patch. {code} } catch (Exception ioe) { // our crazy heuristics can easily trigger SecurityException, AIOOBE, etc ... return true; } {code} {quote} ...what about people with lots of partitions? ie: /dev/sda42 {quote} Maybe if you quoted more of the context, you would see this was in a loop? Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250333#comment-14250333 ] Uwe Schindler commented on LUCENE-5951: --- +1 The heavy funny heuristics method is a masterpiece of coding in contrast to Hadoop's detection. I am so happy that it does not span df or mount commands! Many thanks :-) Java 7 is cool! Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250333#comment-14250333 ] Uwe Schindler edited comment on LUCENE-5951 at 12/17/14 7:12 PM: - +1 The heavy funny heuristics method is a masterpiece of coding in contrast to Hadoop's detection. I am so happy that it does not exec df or mount commands! Many thanks :-) Java 7 is cool! was (Author: thetaphi): +1 The heavy funny heuristics method is a masterpiece of coding in contrast to Hadoop's detection. I am so happy that it does not span df or mount commands! Many thanks :-) Java 7 is cool! Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6845) figure out why suggester causes slow startup - even when not used
[ https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250350#comment-14250350 ] Tomás Fernández Löbbe commented on SOLR-6845: - Trying to add some unit tests to this feature I found another issue. SuggestComponent and SpellcheckComponent rely on a {{firstSearcherListener}} to load (and in this case, also build) some structures. These firstSearcherListeners are registered on {{SolrCoreAware.inform()}}, however the first searcher listener task is only added to the queue of warming tasks if there is at least one listener registered at the time of the first searcher creation (before SolrCoreAware.inform() is ever called). See {code:title=SolrCore.java} if (currSearcher == null firstSearcherListeners.size() 0) { future = searcherExecutor.submit(new Callable() { @Override public Object call() throws Exception { try { for (SolrEventListener listener : firstSearcherListeners) { listener.newSearcher(newSearcher, null); } } catch (Throwable e) { SolrException.log(log, null, e); if (e instanceof Error) { throw (Error) e; } } return null; } }); } {code} I'll create a new Jira for this figure out why suggester causes slow startup - even when not used - Key: SOLR-6845 URL: https://issues.apache.org/jira/browse/SOLR-6845 Project: Solr Issue Type: Bug Reporter: Hoss Man SOLR-6679 was filed to track the investigation into the following problem... {panel} The stock solrconfig provides a bad experience with a large index... start up Solr and it will spin at 100% CPU for minutes, unresponsive, while it apparently builds a suggester index. ... This is what I did: 1) indexed 10M very small docs (only takes a few minutes). 2) shut down Solr 3) start up Solr and watch it be unresponsive for over 4 minutes! I didn't even use any of the fields specified in the suggester config and I never called the suggest request handler. {panel} ..but ultimately focused on removing/disabling the suggester from the sample configs. Opening this new issue to focus on actually trying to identify the root problem fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6787) API to manage blobs in Solr
[ https://issues.apache.org/jira/browse/SOLR-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250397#comment-14250397 ] Yonik Seeley commented on SOLR-6787: bq. These are not really special APIs. I was responding to this: APIs need to be created to manage the content of that collection And I was wondering since binary field and blob seem synonymous, why there would be a separate/different API to get/set the value of such a field. bq. All the handlers loaded from .system will be automatically be startup=lazy . But request handlers are one of the only things that have support for lazy. What's the plan to support custom SearchComponents, Update processors, QParsers, or ValueSourceParsers (all of those are very common)? Also, a big question is persistence. What happens when you add a request handler via API, and then the server is bounced? bq. We are rethinking the way Solr is being used. That's great, but please do so in public forums so everyone can participate in the discussion. API to manage blobs in Solr Key: SOLR-6787 URL: https://issues.apache.org/jira/browse/SOLR-6787 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 5.0, Trunk Attachments: SOLR-6787.patch, SOLR-6787.patch A special collection called .system needs to be created by the user to store/manage blobs. The schema/solrconfig of that collection need to be automatically supplied by the system so that there are no errors APIs need to be created to manage the content of that collection {code} #create your .system collection first http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2 #The config for this collection is automatically created . numShards for this collection is hardcoded to 1 #create a new jar or add a new version of a jar curl -X POST -H 'Content-Type: application/octet-stream' --data-binary @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point would give a list of jars and other details curl http://localhost:8983/solr/.system/blob # GET on the end point with jar name would give details of various versions of the available jars curl http://localhost:8983/solr/.system/blob/mycomponent # GET on the end point with jar name and version with a wt=filestream to get the actual file curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream mycomponent.1.jar # GET on the end point with jar name and wt=filestream to get the latest version of the file curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream mycomponent.jar {code} Please note that the jars are never deleted. a new version is added to the system everytime a new jar is posted for the name. You must use the standard delete commands to delete the old entries -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250508#comment-14250508 ] Mikhail Khludnev commented on SOLR-6862: please make sure you don't have autocommit enabled full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6863) We should use finite timeouts when getting http connections from pools.
Mark Miller created SOLR-6863: - Summary: We should use finite timeouts when getting http connections from pools. Key: SOLR-6863 URL: https://issues.apache.org/jira/browse/SOLR-6863 Project: Solr Issue Type: Improvement Components: rCloud, SolrCloud Reporter: Mark Miller Assignee: Mark Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6863) We should use finite timeouts when getting http connections from pools.
[ https://issues.apache.org/jira/browse/SOLR-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6863: -- Component/s: (was: rCloud) (was: SolrCloud) We should use finite timeouts when getting http connections from pools. --- Key: SOLR-6863 URL: https://issues.apache.org/jira/browse/SOLR-6863 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
[ https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-6033: - Attachment: LUCENE-6033_boolean_resetInput_option.patch This patch adds a resetStream constructor option such that fillCache() will propagate reset() if this is set. Very simple. I enhanced the test for this and for isCached(). This option goes hand-in-hand with the use of isCached() for the use-case I had in mind by allowing you to pass a tokenStream to something that might not need it, thereby allowing you to not only toss the CachingTokenFilter if it wasn't actually cached, but avoid a redundant reset() call on the underlying input. Add CachingTokenFilter.isCached and switch LinkedList to ArrayList -- Key: LUCENE-6033 URL: https://issues.apache.org/jira/browse/LUCENE-6033 Project: Lucene - Core Issue Type: Improvement Reporter: David Smiley Assignee: David Smiley Fix For: 5.0, Trunk Attachments: LUCENE-6033.patch, LUCENE-6033_boolean_resetInput_option.patch CachingTokenFilter could use a simple boolean isCached() method implemented as-such: {code:java} /** If the underlying token stream was consumed and cached */ public boolean isCached() { return cache != null; } {code} It's useful for the highlighting code to remove its wrapping of CachingTokenFilter if after handing-off to parts of its framework it turns out that it wasn't used. Furthermore, use an ArrayList, not a LinkedList. ArrayList is leaner when the token count is high, and this class doesn't manipulate the list in a way that might favor LL. A separate patch will come that actually uses this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250542#comment-14250542 ] Jason Wang commented on SOLR-6862: -- Thanks Mikhail, What should I do to disable autocommit? set commit=false like /dataimport?command=full-importclean=truecommit=falsedebug=falseindent=trueverbose=trueoptimize=truewt=json or in data-conf.xml datasource definition set autoCommit =false. Appreciate your help very much. Thanks again, Jason full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250552#comment-14250552 ] Mikhail Khludnev commented on SOLR-6862: neither ones, I mean Solr autocommit mentioned at http://wiki.apache.org/solr/SolrConfigXml full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
[ https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250595#comment-14250595 ] Robert Muir commented on LUCENE-6033: - I dont understand this option, why do we need it? How is it useful to the consumers that use CachingTokenFilter (like queryparser). It seems more of an abusive case. Add CachingTokenFilter.isCached and switch LinkedList to ArrayList -- Key: LUCENE-6033 URL: https://issues.apache.org/jira/browse/LUCENE-6033 Project: Lucene - Core Issue Type: Improvement Reporter: David Smiley Assignee: David Smiley Fix For: 5.0, Trunk Attachments: LUCENE-6033.patch, LUCENE-6033_boolean_resetInput_option.patch CachingTokenFilter could use a simple boolean isCached() method implemented as-such: {code:java} /** If the underlying token stream was consumed and cached */ public boolean isCached() { return cache != null; } {code} It's useful for the highlighting code to remove its wrapping of CachingTokenFilter if after handing-off to parts of its framework it turns out that it wasn't used. Furthermore, use an ArrayList, not a LinkedList. ArrayList is leaner when the token count is high, and this class doesn't manipulate the list in a way that might favor LL. A separate patch will come that actually uses this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250600#comment-14250600 ] Hoss Man commented on LUCENE-5951: -- bq. Maybe you missed the try-catch when looking at the patch. that still seems sketchy because it's only in the spins() method ... it's going to be trappy if/when this code gets refactored and getDeviceName is called from somewhere else. why not just include some basic exception handling in getDeviceName as well? bq. Maybe if you quoted more of the context, you would see this was in a loop? I did see that, but i didn't realize the purpose was to chomp away at individual digits in the path until it resolved as a valid file... too much voodoo for me, i'll shut up now. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250601#comment-14250601 ] Jason Wang commented on SOLR-6862: -- Hi Mikhail, I commented out autoCommit and autoSoftCommit in updateHandler block in solrconfig.xml, it works. Is there impact or side effect to do this? Thank you very much, Jason full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250607#comment-14250607 ] Mikhail Khludnev commented on SOLR-6862: pls check https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit and http://opensourceconnections.com/blog/2013/04/25/understanding-solr-soft-commits-and-data-durability/ and close this issue please. Thanks! full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250618#comment-14250618 ] Jason Wang commented on SOLR-6862: -- Thanks for your quick response and appreciate your help. full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted
[ https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Wang closed SOLR-6862. Resolution: Not a Problem This is not a issue. full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted -- Key: SOLR-6862 URL: https://issues.apache.org/jira/browse/SOLR-6862 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.7.1 Reporter: Jason Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250629#comment-14250629 ] Robert Muir commented on LUCENE-5951: - The method is private. its not getting called from anywhere else. when an exception strikes we *need* it, so that it causes the whole thing to return true. it also has a comment above it '// these are hacks that are not guaranteed'. Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults
[ https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250635#comment-14250635 ] Robert Muir commented on LUCENE-5951: - {quote} I did see that, but i didn't realize the purpose was to chomp away at individual digits in the path until it resolved as a valid file... {quote} It has this comment: {code} // tear away partition numbers until we find it. {code} Detect when index is on SSD and set dynamic defaults Key: LUCENE-5951 URL: https://issues.apache.org/jira/browse/LUCENE-5951 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on SSD and 1 if it's on spinning disks. I think the new NIO2 APIs can let us figure out which device we are mounted on, and from there maybe we can do os-specific stuff e.g. look at /sys/block/dev/queue/rotational to see if it's spinning storage or not ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
[ https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250666#comment-14250666 ] David Smiley commented on LUCENE-6033: -- Hi Rob. I think it's easier to make the case that CachingTokenFilter should have been propagating reset() from it's fillCache() all along, and thus you would then use CachingTokenFilter in a more normal way -- wrap it and call reset() then increment in a loop, etc., instead of knowing you need to reset() on what it wraps but not this token filter itself. That's weird. It's ab-normal for a TokenFilter to never propagate reset, so every user of CachingTokenFilter to date has worked around this by calling reset() on the underlying input _instead_ of the final wrapping token filter (CachingTokenFilter in this case). To be clear, CachingTokenFilter._reset()_ didn't and still doesn't with this patch propagate reset(), it happens the one time it consumes the stream indirectly via incrementToken(). The exact case that brought me here is as follows: DefaultSolrHighlighter has a block of code activated when you pass hl.usePhraseHighlighter (around line 501). This block of code calls getPhraseHighlighter passing in a token stream that may never actually be used by that method. This is an extension point for subclassing, our shipped code doesn't use it at all. Prior to me doing SOLR-6680, we'd always then pass the CachingTokenFilter further on into the Highlighter. But unless getPhraseHighlighter actually uses the token stream, doing this is a waste (needless caching of every token -- pretty bulky). So with isCached() I can now see if it was used, and if not then toss the CachingTokenFilter aside. The problem is that isCached() isn't enough here; I overlooked it in SOLR-6680 (no test for this extension point). I was hoping to simply declare that if you want to use this token stream, you need to call reset() on it first. But CachingTokenFilter doesn't propagate the reset()! So it won't get reset. I _could_ add a reset on the underlying stream before calling getPhraseHighlighter but doing so would likely result in reset() being called twice in a row when the caching isn't needed; Highlighter calls reset(). Test assertions trip when this happens, although I think in practice it's fine. Add CachingTokenFilter.isCached and switch LinkedList to ArrayList -- Key: LUCENE-6033 URL: https://issues.apache.org/jira/browse/LUCENE-6033 Project: Lucene - Core Issue Type: Improvement Reporter: David Smiley Assignee: David Smiley Fix For: 5.0, Trunk Attachments: LUCENE-6033.patch, LUCENE-6033_boolean_resetInput_option.patch CachingTokenFilter could use a simple boolean isCached() method implemented as-such: {code:java} /** If the underlying token stream was consumed and cached */ public boolean isCached() { return cache != null; } {code} It's useful for the highlighting code to remove its wrapping of CachingTokenFilter if after handing-off to parts of its framework it turns out that it wasn't used. Furthermore, use an ArrayList, not a LinkedList. ArrayList is leaner when the token count is high, and this class doesn't manipulate the list in a way that might favor LL. A separate patch will come that actually uses this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6118) Improve efficiency of the history structure for filter caching
[ https://issues.apache.org/jira/browse/LUCENE-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250673#comment-14250673 ] Ryan Ernst commented on LUCENE-6118: +1 For {{IntBag.remove()}} when the frequency reaches 0, could you find the end of the chain and move that back into the slot was just broken? This would then require moving up to only one element of the bag, instead of re-adding all elements after the old value in the chain. Something like: {code} if (newFreq == 0) { // move the last key in the chain back into this zeroed slot int slot2 = (slot + 1) mask; while (freqs[slot2] != 0) { slot2 = (slot2 + 1) mask; } keys[slot] = keys[slot2]; freqs[slot] = freqs[slot2]; } {code} Improve efficiency of the history structure for filter caching -- Key: LUCENE-6118 URL: https://issues.apache.org/jira/browse/LUCENE-6118 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6118.patch The filter caching uses a ring buffer that tracks frequencies of the hashcodes of the most-recently used filters. However it is based on an ArrayDequeInteger and a HashMapInteger which keep on (un)wrapping ints. Since the data-structure is very simple, we could try to do something better... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6736) A collections-like request handler to manage solr configurations on zookeeper
[ https://issues.apache.org/jira/browse/SOLR-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Rajput updated SOLR-6736: --- Fix Version/s: Trunk 5.0 A collections-like request handler to manage solr configurations on zookeeper - Key: SOLR-6736 URL: https://issues.apache.org/jira/browse/SOLR-6736 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Varun Rajput Priority: Minor Fix For: 5.0, Trunk Managing Solr configuration files on zookeeper becomes cumbersome while using solr in cloud mode, especially while trying out changes in the configurations. It will be great if there is a request handler that can provide an API to manage the configurations similar to the collections handler that would allow actions like uploading new configurations, linking them to a collection, deleting configurations, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 706 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/706/ 2 tests failed. REGRESSION: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch Error Message: Test abandoned because suite timeout was reached. Stack Trace: java.lang.Exception: Test abandoned because suite timeout was reached. at __randomizedtesting.SeedInfo.seed([C8158DE0FACE4CDF]:0) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.ChaosMonkeySafeLeaderTest Error Message: Suite timeout exceeded (= 720 msec). Stack Trace: java.lang.Exception: Suite timeout exceeded (= 720 msec). at __randomizedtesting.SeedInfo.seed([C8158DE0FACE4CDF]:0) Build Log: [...truncated 11273 lines...] [junit4] Suite: org.apache.solr.cloud.ChaosMonkeySafeLeaderTest [junit4] 2 Creating dataDir: /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/build/solr-core/test/J1/temp/solr.cloud.ChaosMonkeySafeLeaderTest-C8158DE0FACE4CDF-001/init-core-data-001 [junit4] 2 1224614 T48440 oas.SolrTestCaseJ4.buildSSLConfig Randomized ssl (false) and clientAuth (true) [junit4] 2 1224614 T48440 oas.BaseDistributedSearchTestCase.initHostContext Setting hostContext system property: / [junit4] 2 1224620 T48440 oas.SolrTestCaseJ4.setUp ###Starting testDistribSearch [junit4] 2 1224621 T48440 oasc.ZkTestServer.run STARTING ZK TEST SERVER [junit4] 1 client port:0.0.0.0/0.0.0.0:0 [junit4] 2 1224622 T48441 oasc.ZkTestServer$ZKServerMain.runFromConfig Starting server [junit4] 2 1224722 T48440 oasc.ZkTestServer.run start zk server on port:14104 [junit4] 2 1224723 T48440 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 1224724 T48440 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 1224728 T48448 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@15fafb22 name:ZooKeeperConnection Watcher:127.0.0.1:14104 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 1224729 T48440 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 1224729 T48440 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 1224730 T48440 oascc.SolrZkClient.makePath makePath: /solr [junit4] 2 1224733 T48440 oascc.SolrZkClient.createZkCredentialsToAddAutomatically Using default ZkCredentialsProvider [junit4] 2 1224734 T48440 oascc.ConnectionManager.waitForConnected Waiting for client to connect to ZooKeeper [junit4] 2 1224735 T48451 oascc.ConnectionManager.process Watcher org.apache.solr.common.cloud.ConnectionManager@6d3094a3 name:ZooKeeperConnection Watcher:127.0.0.1:14104/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 1224736 T48440 oascc.ConnectionManager.waitForConnected Client is connected to ZooKeeper [junit4] 2 1224736 T48440 oascc.SolrZkClient.createZkACLProvider Using default ZkACLProvider [junit4] 2 1224736 T48440 oascc.SolrZkClient.makePath makePath: /collections/collection1 [junit4] 2 1224739 T48440 oascc.SolrZkClient.makePath makePath: /collections/collection1/shards [junit4] 2 1224741 T48440 oascc.SolrZkClient.makePath makePath: /collections/control_collection [junit4] 2 1224742 T48440 oascc.SolrZkClient.makePath makePath: /collections/control_collection/shards [junit4] 2 1224744 T48440 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml to /configs/conf1/solrconfig.xml [junit4] 2 1224744 T48440 oascc.SolrZkClient.makePath makePath: /configs/conf1/solrconfig.xml [junit4] 2 1224747 T48440 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/core/src/test-files/solr/collection1/conf/schema15.xml to /configs/conf1/schema.xml [junit4] 2 1224748 T48440 oascc.SolrZkClient.makePath makePath: /configs/conf1/schema.xml [junit4] 2 1224850 T48440 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/core/src/test-files/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml to /configs/conf1/solrconfig.snippet.randomindexconfig.xml [junit4] 2 1224851 T48440 oascc.SolrZkClient.makePath makePath: /configs/conf1/solrconfig.snippet.randomindexconfig.xml [junit4] 2 1224853 T48440 oasc.AbstractZkTestCase.putConfig put /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-5.x/solr/core/src/test-files/solr/collection1/conf/stopwords.txt to /configs/conf1/stopwords.txt [junit4] 2 1224853 T48440 oascc.SolrZkClient.makePath makePath: /configs/conf1/stopwords.txt [junit4] 2 1224855 T48440