Re: VOTE: Lucene/Solr 3.6.1
Ok thanks Mark. Sent from my mobile device 720-256-8076 On Jul 18, 2012, at 6:41 AM, Mark Miller markrmil...@gmail.com wrote: It's a trunk issue, not 3.x Sent from my iPhone On Jul 18, 2012, at 3:23 AM, Uwe Schindler u...@thetaphi.de wrote: Hi William, The mentioned issue has no patch and the committed stuff seems to only apply for Lucene trunk (it uses softcommit in tests). I asked on Monday, if anybody has patches to back port and this one was not mentioned. You could have reopened it long ago and set the fix version to 3.6.1. The vote for the new release has not yet officially passed, but we got until now no negative responses from PMC members. If Yonik, who opened the issue, confirms that this is an issue for 3.6 and wants to fix it there, I may be able to respin the release, but in general this would better wait for 3.6.2. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: William Bell [mailto:billnb...@gmail.com] Sent: Wednesday, July 18, 2012 4:56 AM To: dev@lucene.apache.org Subject: Re: VOTE: Lucene/Solr 3.6.1 Can we try to get this in? Seems like a major issue to us. https://issues.apache.org/jira/browse/SOLR-3392 On Tue, Jul 17, 2012 at 9:01 AM, Uwe Schindler u...@thetaphi.de wrote: Please vote to release these artifacts for Apache Lucene and Solr 3.6.1: http://s.apache.org/lucene361 I tested with dev-tools/scripts/smokeTestRelease.py, ran rat-sources on both source releases, tested solr example, and reviewed packaging contents. There was only minor issue in the SmokeTester: It did not test Solr with Java 5, but I did that manually so Solr example + tests works with Java 5 (as the release on itsself was built with Java 5). Here's my +1. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Bill Bell billnb...@gmail.com cell 720-256-8076 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
Lance Norskog created SOLR-3653: --- Summary: Support Smart Simplified Chinese in Solr - include clean-up bigramming filter Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
[ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated SOLR-3653: Attachment: SmartChineseType.pdf Support Smart Simplified Chinese in Solr - include clean-up bigramming filter - Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog Attachments: SOLR-3653.patch, SmartChineseType.pdf The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
[ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated SOLR-3653: Attachment: SOLR-3653.patch Support Smart Simplified Chinese in Solr - include clean-up bigramming filter - Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog Attachments: SOLR-3653.patch, SmartChineseType.pdf The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
[ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418997#comment-13418997 ] Lance Norskog commented on SOLR-3653: - The SmartChineseWordTokenFilter is a statistical algorithm (Hidden Markov Model to be exact) which was trained on a corpus of training text. It's purpose is to split text into words, which are singles, bigrams and occasionally trigrams of Simplified Chinese ideograms (letters). It does a very good job, but since it is statistically based it is not perfect. When it fails, it emits words that are 4 or more ideograms. These are really phrases. These phrases contain real words which should be searchable. The attached PDF of the Analysis page shows the problem. Chinese legal text proved a pathological case and created a 7-ideogram word. In order to make parts of this text searchable, the 7-letter phrase has to be broken into n-grams. Unigrams give more recall while bigrams give more precision. This patch includes a new SmartChineseBigramFilter takes any words not split by the WordTokenFilter and creates bigrams from them. The bigrams only span the unsplit phrase. They do not overlap between two adjoining unsplit phrases. The attached PDF shows this effect as well between the first and second unsplit phrases. I am not an expert on the Chinese language or the HMM technology used in the Smart Chinese toolkit. I created the bigram filter after difficulties attempting to supply a high-quality search experience for Chinese legal documents. This is a straw-man solution to the problem. If you know better, please say so and we will iterate. The patch includes a 'text_zh' field type which includes the bigram filter. The bigram filter is essential if 'text_zh' is to be the preferred recommendation. Support Smart Simplified Chinese in Solr - include clean-up bigramming filter - Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog Attachments: SOLR-3653.patch, SmartChineseType.pdf The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows-Java6-64 - Build # 384 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java6-64/384/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.CloudStateUpdateTest Error Message: ERROR: SolrIndexSearcher opens=5 closes=4 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=5 closes=4 at __randomizedtesting.SeedInfo.seed([FD5FC09E1D433B61]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:216) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:754) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 15236 lines...] [junit4:junit4] Suite: org.apache.solr.cloud.CloudStateUpdateTest [junit4:junit4] (@AfterClass output) [junit4:junit4] 2 22468 T1039 oas.SolrTestCaseJ4.deleteCore ###deleteCore [junit4:junit4] 2 147384 T1039 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE ERROR: SolrIndexSearcher opens=5 closes=4 [junit4:junit4] 2 NOTE: test params are: codec=Appending, sim=RandomSimilarityProvider(queryNorm=true,coord=false): {}, locale=es_PA, timezone=America/Cordoba [junit4:junit4] 2 NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_33 (64-bit)/cpus=2,threads=3,free=67579512,total=195362816 [junit4:junit4] 2 NOTE: All tests run in this JVM: [DocumentBuilderTest, TestCollationKeyRangeQueries, DistributedTermsComponentTest, TestQueryUtils, TestCharFilters, TestElisionFilterFactory, UUIDFieldTest, TestJapaneseBaseFormFilterFactory, SolrRequestParserTest, TestUpdate, FastVectorHighlighterTest, TestPatternReplaceFilterFactory, TestRangeQuery, DocumentAnalysisRequestHandlerTest, TestSwedishLightStemFilterFactory, CircularListTest, TestPorterStemFilterFactory, RAMDirectoryFactoryTest, DateMathParserTest, DistributedSpellCheckComponentTest, TestQuerySenderNoQuery, UpdateRequestProcessorFactoryTest, ShowFileRequestHandlerTest, TestHungarianLightStemFilterFactory, TestIrishLowerCaseFilterFactory, DebugComponentTest, BasicDistributedZkTest, TestCoreContainer, ZkSolrClientTest, SnowballPorterFilterFactoryTest, TestReversedWildcardFilterFactory, SpellCheckCollatorTest, TestOmitPositions, SOLR749Test, TestSlowSynonymFilter, TestSort, UpdateParamsTest, TestQuerySenderListener, LegacyHTMLStripCharFilterTest, CommonGramsQueryFilterFactoryTest, CloudStateTest, TestLMDirichletSimilarityFactory, TestLatvianStemFilterFactory, BadComponentTest, SpellPossibilityIteratorTest, TestPortugueseMinimalStemFilterFactory, SortByFunctionTest, SolrCoreCheckLockOnStartupTest, TestRemoveDuplicatesTokenFilterFactory, NotRequiredUniqueKeyTest, ScriptEngineTest, TestPHPSerializedResponseWriter, FullSolrCloudDistribCmdsTest, TestPatternReplaceCharFilterFactory, TestNorwegianMinimalStemFilterFactory, TestKeepFilterFactory, SpatialFilterTest,
[jira] [Updated] (SOLR-3618) Enable replication of master using proxy settings
[ https://issues.apache.org/jira/browse/SOLR-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautier Koscielny updated SOLR-3618: Attachment: SnapPuller.java.patch I've modified the createHttpClient method to take proxy settings into account. The HttpClient instance is created as before and then proxy settings are added to the host configuration if required. Enable replication of master using proxy settings - Key: SOLR-3618 URL: https://issues.apache.org/jira/browse/SOLR-3618 Project: Solr Issue Type: Improvement Components: replication (java) Affects Versions: 3.6.1 Reporter: Gautier Koscielny Labels: patch Fix For: 3.6.1 Attachments: SnapPuller.java.patch Original Estimate: 4h Remaining Estimate: 4h Check whether system properties http.proxyHost and http.proxyPort are set to initialize the httpClient instance properly in the SnapPuller class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3167) Allow running embedded zookeeper 1 for 1 dynamically with solr nodes
[ https://issues.apache.org/jira/browse/SOLR-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419015#comment-13419015 ] Jan Høydahl commented on SOLR-3167: --- I was thinking auto-everything by default :) like ElasticSearch # Start Solr on a node without any options other than telling it to start in cloud mode ## If -DzkHost is not specified it will try autoDiscover (through some 0-conf protocol) and join existing ZK ## If no existing ZK found, spin up a local one # Start Solr on another node, it will discover the existing one(s) without any host:port at startup ## If too few ZK servers, will start another one and refresh the ZK list on all other nodes ## If enough ZK servers already, will simply join. Should also be possible to auto-start ZK on another node if one master has failed. Allow running embedded zookeeper 1 for 1 dynamically with solr nodes Key: SOLR-3167 URL: https://issues.apache.org/jira/browse/SOLR-3167 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Right now you have to decide which nodes run zookeeper up front - each node must know the list of all the servers in the ensemble. Growing or shrinking the list of nodes requires a rolling restart. https://issues.apache.org/jira/browse/ZOOKEEPER-1355 (Add zk.updateServerList(newServerList) might be able to help us here. Perhaps the over seer could make a call to each replica when the list changes and use the update server list call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3167) Allow running embedded zookeeper 1 for 1 dynamically with solr nodes
[ https://issues.apache.org/jira/browse/SOLR-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419016#comment-13419016 ] Raju commented on SOLR-3167: hi Allow running embedded zookeeper 1 for 1 dynamically with solr nodes Key: SOLR-3167 URL: https://issues.apache.org/jira/browse/SOLR-3167 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Right now you have to decide which nodes run zookeeper up front - each node must know the list of all the servers in the ensemble. Growing or shrinking the list of nodes requires a rolling restart. https://issues.apache.org/jira/browse/ZOOKEEPER-1355 (Add zk.updateServerList(newServerList) might be able to help us here. Perhaps the over seer could make a call to each replica when the list changes and use the update server list call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4224) Simplify MultiValuedCase in TermsIncludingScoreQuery
[ https://issues.apache.org/jira/browse/LUCENE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen reassigned LUCENE-4224: - Assignee: Martijn van Groningen Simplify MultiValuedCase in TermsIncludingScoreQuery Key: LUCENE-4224 URL: https://issues.apache.org/jira/browse/LUCENE-4224 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Martijn van Groningen Attachments: LUCENE-4224.patch While looking at LUCENE-4214, i was trying to wrap my head around what this is doing... I think the code specialization in the multivalued scorer doesn't buy us any additional speed? At least according to my benchmarks? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419032#comment-13419032 ] Markus Jelsma commented on SOLR-1781: - Hi, is the core reloading still part of this? I get a lot of firstSearcher events on a test node now and it won't get online. Going back to July 18th (before this patch) build works fine. Other nodes won't come online with a build from the 19th (after this patch). Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4109) BooleanQueries are not parsed correctly with the flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karsten R. updated LUCENE-4109: --- Attachment: LUCENE-4109.patch Patch for lucene/contrib against http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6 The patch adds the Processor BooleanQuery2ModifierNodeProcessor. The patch also changes ParametricRangeQueryNodeProcessor as hotfix for LUCENE-3338 (this change is not for 4.X because LUCENE-3338 is already fixed in 4.X). The patch passes all tests from QueryParserTestBase except {{{assertQueryEquals([\\* TO \*\],null,[\\* TO \\*]);}}} and LUCENE-2566 related tests. Patch for trunk will coming soon. BooleanQueries are not parsed correctly with the flexible query parser -- Key: LUCENE-4109 URL: https://issues.apache.org/jira/browse/LUCENE-4109 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.5, 3.6 Reporter: Daniel Truemper Fix For: 4.0 Attachments: LUCENE-4109.patch, test-patch.txt Hi, I just found another bug in the flexible query parser (together with Robert Muir, yay!). The following query string works in the standard query parser: {noformat} (field:[1 TO *] AND field:[* TO 2]) AND field2:z {noformat} yields {noformat} +(+field:[1 TO *] +field:[* TO 2]) +field2:z {noformat} The flexible query parser though yields: {noformat} +(field:[1 TO *] field:[* TO 2]) +field2:z {noformat} Test patch is attached (from Robert actually). I don't know if it affects earlier versions than 3.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3151) Make all of Analysis completely independent from Lucene Core
[ https://issues.apache.org/jira/browse/LUCENE-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-3151: Attachment: LUCENE-3151.patch Here's a first draft at this. The packaging looks more or less right, but I haven't fully tested it yet. The main downsides to this approach are: # Minor loss of Javadoc due to references to things like IndexWriter, DoubleField, etc. I kept the references, just removed the @link, which allowed me to drop the import statement # We need to somehow document that this jar is for standalone use only. It's probably a minor issue, but going forward, people could get into classloader hell with this if they are mixing versions. Of course, that's always the case in Java, so caveat emptor. Make all of Analysis completely independent from Lucene Core Key: LUCENE-3151 URL: https://issues.apache.org/jira/browse/LUCENE-3151 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Fix For: 4.1 Attachments: LUCENE-3151.patch, LUCENE-3151.patch Lucene's analysis package, including the definitions of Attribute, TokenStream, etc. are quite useful outside of Lucene (for instance, Mahout uses them) for text processing. I'd like to move the definitions, or at least their packaging, to a separate JAR file so that one can consume them w/o needing Lucene core. My draft idea is to have a definition area that Lucene core is dependent on and the rest of the analysis package can then be dependent on the definition area. (I'm open to other ideas as well) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3151) Make all of Analysis completely independent from Lucene Core
[ https://issues.apache.org/jira/browse/LUCENE-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419045#comment-13419045 ] Grant Ingersoll commented on LUCENE-3151: - I should add: to run this, for now, do {code}ant jar-analyzer-definition{code}. Still need to make sure it fully hooks into the rest of the build correctly, too. Make all of Analysis completely independent from Lucene Core Key: LUCENE-3151 URL: https://issues.apache.org/jira/browse/LUCENE-3151 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Fix For: 4.1 Attachments: LUCENE-3151.patch, LUCENE-3151.patch Lucene's analysis package, including the definitions of Attribute, TokenStream, etc. are quite useful outside of Lucene (for instance, Mahout uses them) for text processing. I'd like to move the definitions, or at least their packaging, to a separate JAR file so that one can consume them w/o needing Lucene core. My draft idea is to have a definition area that Lucene core is dependent on and the rest of the analysis package can then be dependent on the definition area. (I'm open to other ideas as well) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: top level ant test shouldn't validate?
+1 on the original question... IntelliJ doesn't seem to have the problem, I run ant clean from the top level all the time and my projects that depend on it seem to work fine. I vaguely remember in Eclipse having to do something like a project refresh to get things back in synch, but that may be unrelated On Thu, Jul 19, 2012 at 8:56 PM, Mark Miller markrmil...@gmail.com wrote: Top level any clean breaks my IDE too! I don't know the fine points of this conversation, but it's super painful and I never call top level ant clean anymore. I kept meaning to look into why it was killing me but never got to it. Sent from my iPhone On Jul 19, 2012, at 12:46 PM, Robert Muir rcm...@gmail.com wrote: +1, we have caged the rat, we should be able to have a simple precommit check. also top-level 'ant clean' shouldn't call clean-jars. This *totally messes up* my IDE just because I like to run tests from the command-line. On Thu, Jul 19, 2012 at 12:40 PM, Steven A Rowe sar...@syr.edu wrote: On 7/19/2012 at 12:35 PM, Michael McCandless wrote: Any objections to fixing top level ant test to simply run tests...? Maybe we can add a precommit target to run tests, validate, javadocs-lint, ... +1 Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3654) Add some tests using Tomcat as servlet container
Jan Høydahl created SOLR-3654: - Summary: Add some tests using Tomcat as servlet container Key: SOLR-3654 URL: https://issues.apache.org/jira/browse/SOLR-3654 Project: Solr Issue Type: Task Components: Build Environment: Tomcat Reporter: Jan Høydahl Fix For: 4.0 All tests use Jetty, we should add some tests for at least one other servlet container (Tomcat). Ref discussion at http://search-lucene.com/m/6mo9Y1WZaWR1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4227: --- Attachment: LUCENE-4227.patch New patch, fixing previous nocommits / downgrading to TODOs. I also removed the specialized scorers since they seem not to help much. All tests pass, but I still need to fix all tests that now avoid MemoryPF to also avoid DirectPF. Otherwise I think it's ready... DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3654) Add some tests using Tomcat as servlet container
[ https://issues.apache.org/jira/browse/SOLR-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419068#comment-13419068 ] Jan Høydahl commented on SOLR-3654: --- Have not done a lot of investigation but we could probably build this test using Cargo: http://cargo.codehaus.org/Ant+support Having got Tomcat test support, it should be trivial to add other supported containers as well [Geronimo, Glassfish, JBoss, Resin...]. Also, if all Jetty tests now use {{JettySolrRunner}}, we have a single point of entry to plug in container randomization further down the road. This could be controlled by options so that Jetty is default but nightly builds randomize container per run. Add some tests using Tomcat as servlet container Key: SOLR-3654 URL: https://issues.apache.org/jira/browse/SOLR-3654 Project: Solr Issue Type: Task Components: Build Environment: Tomcat Reporter: Jan Høydahl Labels: Tomcat Fix For: 4.0 All tests use Jetty, we should add some tests for at least one other servlet container (Tomcat). Ref discussion at http://search-lucene.com/m/6mo9Y1WZaWR1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Discuss] Should Solr be an AppServer agnostic WAR or require Jetty?
I've created SOLR-3654 as a placeholder for adding tests using Tomcat (and possibly other). -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 16. juli 2012, at 23:52, Chris Hostetter wrote: : Specificly: it would be a terrible idea to try and rush a change like this : in before Solr 4.0-FINAL ... : : That's just a silly premise - no one in this conversation even remotely : suggested we stop using webapps or wars for Solr 4.0-FINAL. Switching to : non webapp tech is 'probably' a bit of work. Right ... I didn't get the impression anyone who had spoken up so far was suggestiong a change like this for Solr 4.0-FINAL. I just wanted to state that while i have very little opinion about *if* we should make a change like this, i have strong opinions about *when* we should try to make a change like this, if the discussion does go in that direction. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419071#comment-13419071 ] Robert Muir commented on LUCENE-4227: - Would it really be that much slower if it was slightly more reasonable, e.g. storing freqs in packed ints (with huper-duper fast options) instead of wasting so much on them? DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3636) edismax, synonyms and mm=100%
[ https://issues.apache.org/jira/browse/SOLR-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3636: -- Component/s: query parsers Fix Version/s: 5.0 4.0 edismax, synonyms and mm=100% - Key: SOLR-3636 URL: https://issues.apache.org/jira/browse/SOLR-3636 Project: Solr Issue Type: Bug Components: query parsers Reporter: Lance Norskog Priority: Minor Fix For: 4.0, 5.0 There is a problem with query-side synonyms, edismax and must-match=100%. edismax interprets must-match=100% as number of terms found by edismax from the original query. These terms go through the query analyzer, and the synonym filter creates more terms, *but* the must-match term count is not incremented. Thus, given a synonym of {code} monkeyhouse = monkey house {code} the query {{q=big+monkeyhousemm=100%}} becomes (effectively) {{q=big+monkey+housemm=2}}. This query finds documents matching only two out of three terms {{big+monkey, monkey+house, big+house}}. This might also be a problem in dismax. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3646) /browse request handler fails in example if you don't specify a field in the query with no default specified via 'df' param
[ https://issues.apache.org/jira/browse/SOLR-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-3646. -- Resolution: Fixed 4x: 1363747 trunk: 1363751 /browse request handler fails in example if you don't specify a field in the query with no default specified via 'df' param - Key: SOLR-3646 URL: https://issues.apache.org/jira/browse/SOLR-3646 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.0, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3646.patch Original Estimate: 1h Remaining Estimate: 1h If you try using the stock /browse request handler and don't specify a field in the search, you get the following stack (partial): SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:136) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:154) at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1063) at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:350) . . . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4109) BooleanQueries are not parsed correctly with the flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-4109: --- Assignee: Robert Muir BooleanQueries are not parsed correctly with the flexible query parser -- Key: LUCENE-4109 URL: https://issues.apache.org/jira/browse/LUCENE-4109 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.5, 3.6 Reporter: Daniel Truemper Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4109.patch, test-patch.txt Hi, I just found another bug in the flexible query parser (together with Robert Muir, yay!). The following query string works in the standard query parser: {noformat} (field:[1 TO *] AND field:[* TO 2]) AND field2:z {noformat} yields {noformat} +(+field:[1 TO *] +field:[* TO 2]) +field2:z {noformat} The flexible query parser though yields: {noformat} +(field:[1 TO *] field:[* TO 2]) +field2:z {noformat} Test patch is attached (from Robert actually). I don't know if it affects earlier versions than 3.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3292) /browse example fails to load on 3x: no field name specified in query and no default specified via 'df' param
[ https://issues.apache.org/jira/browse/SOLR-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419089#comment-13419089 ] Erick Erickson commented on SOLR-3292: -- I just fixed this in 4.x and trunk, can we close this one? /browse example fails to load on 3x: no field name specified in query and no default specified via 'df' param --- Key: SOLR-3292 URL: https://issues.apache.org/jira/browse/SOLR-3292 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Priority: Blocker Fix For: 3.6, 4.0, 5.0 1) java -jar start.jar using solr example on 3x branch circa r1306629 2) load http://localhost:8983/solr/browse 3) browser error: 400 no field name specified in query and no default specified via 'df' param 4) error in logs... {noformat} INFO: [] webapp=/solr path=/browse params={} hits=0 status=400 QTime=3 Mar 28, 2012 4:05:59 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:158) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:174) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1429) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1245) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.request.SimpleFacets.getFacetQueryCounts(SimpleFacets.java:233) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:194) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4109) BooleanQueries are not parsed correctly with the flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4109: Attachment: LUCENE-4109.patch Patch looks good to me! I also added Daniels test from buzzwords. Thanks for fixing this, and adding additional tests! Once 3.6 branch is open I'll get it in. BooleanQueries are not parsed correctly with the flexible query parser -- Key: LUCENE-4109 URL: https://issues.apache.org/jira/browse/LUCENE-4109 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.5, 3.6 Reporter: Daniel Truemper Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4109.patch, LUCENE-4109.patch, test-patch.txt Hi, I just found another bug in the flexible query parser (together with Robert Muir, yay!). The following query string works in the standard query parser: {noformat} (field:[1 TO *] AND field:[* TO 2]) AND field2:z {noformat} yields {noformat} +(+field:[1 TO *] +field:[* TO 2]) +field2:z {noformat} The flexible query parser though yields: {noformat} +(field:[1 TO *] field:[* TO 2]) +field2:z {noformat} Test patch is attached (from Robert actually). I don't know if it affects earlier versions than 3.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
[ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419096#comment-13419096 ] Robert Muir commented on SOLR-3653: --- {quote} The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories {quote} Actually there are factories in contrib/analysis-extras. {quote} and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. {quote} I am not sure on this: if someone wants to mix an n-gram technique with a word model, they can just use two fields? If they want to limit the n-gram field to only longer terms, they should use LengthFilter. Furthermore, I don't really understand the problem here. The word you are upset about (中华人民共和国) is in the smartcn dictionary. As I understand, this word basically means PRC. This is a single concept and makes sense as an indexing unit. Why do we care how long it is in characters? Support Smart Simplified Chinese in Solr - include clean-up bigramming filter - Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog Attachments: SOLR-3653.patch, SmartChineseType.pdf The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3654) Add some tests using Tomcat as servlet container
[ https://issues.apache.org/jira/browse/SOLR-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419097#comment-13419097 ] Mark Miller commented on SOLR-3654: --- I'm 100% against this. Add some tests using Tomcat as servlet container Key: SOLR-3654 URL: https://issues.apache.org/jira/browse/SOLR-3654 Project: Solr Issue Type: Task Components: Build Environment: Tomcat Reporter: Jan Høydahl Labels: Tomcat Fix For: 4.0 All tests use Jetty, we should add some tests for at least one other servlet container (Tomcat). Ref discussion at http://search-lucene.com/m/6mo9Y1WZaWR1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4224) Simplify MultiValuedCase in TermsIncludingScoreQuery
[ https://issues.apache.org/jira/browse/LUCENE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated LUCENE-4224: -- Attachment: LUCENE-4224.patch Attached a new patch. * Added a Scorer that scores in order. * Existing throw a UOE in the advance() method. Simplify MultiValuedCase in TermsIncludingScoreQuery Key: LUCENE-4224 URL: https://issues.apache.org/jira/browse/LUCENE-4224 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Martijn van Groningen Attachments: LUCENE-4224.patch, LUCENE-4224.patch While looking at LUCENE-4214, i was trying to wrap my head around what this is doing... I think the code specialization in the multivalued scorer doesn't buy us any additional speed? At least according to my benchmarks? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419099#comment-13419099 ] Mark Miller commented on SOLR-1781: --- No, no reload. Can you please elaborate on what not going online means. Can you share logs? Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419105#comment-13419105 ] Jan Høydahl commented on SOLR-2366: --- Mandar, since this patch is Unresolved, the feature is not part of any version (yet), there are only patches attached, which may not apply cleanly if they are old. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 4.1 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3151) Make all of Analysis completely independent from Lucene Core
[ https://issues.apache.org/jira/browse/LUCENE-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419106#comment-13419106 ] Robert Muir commented on LUCENE-3151: - Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! Make all of Analysis completely independent from Lucene Core Key: LUCENE-3151 URL: https://issues.apache.org/jira/browse/LUCENE-3151 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Fix For: 4.1 Attachments: LUCENE-3151.patch, LUCENE-3151.patch Lucene's analysis package, including the definitions of Attribute, TokenStream, etc. are quite useful outside of Lucene (for instance, Mahout uses them) for text processing. I'd like to move the definitions, or at least their packaging, to a separate JAR file so that one can consume them w/o needing Lucene core. My draft idea is to have a definition area that Lucene core is dependent on and the rest of the analysis package can then be dependent on the definition area. (I'm open to other ideas as well) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene on iOS
Hi, This mailing list is only for the main Java based Lucene library. Please ask your question to S4LuceneLibrary directly, which seems to be a completely independent port. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. juli 2012, at 16:14, Tobias Buchholz wrote: Hello, I'm Tobias Buchholz and a student at HTW in Berlin. For my bachelor thesis I'm trying to improve a search algorithm of an iOS app, which offers some magazines with a lot of articles. To do that I used the S4LuceneLibrary of Micheal Papp (https://github.com/mikekppp/S4LuceneLibrary), which is an iOS equivalent to the full-featured text search engine library of Apache Lucene. The problem is, that the search now is very inconsistent...that means the search after specific words takes sometimes very long and on the other hand sometimes not. That's a list of words, I was searching for, and the time the search took: Berlin (34 hits) - 2,8 seconds Tag (29 hits) - 11,8 seconds Haus (3 hits) - 7,1 seconds Straße (28 hits) - 15,7 seconds Raumfahrt (5 hits) - 13,8 seconds Astronomie (9 hits) - 6 second So the results are quite different, but I thought it should take the same time for every search phrase. Do you have an idea why is that? Thanks in advance! Best Regards, Tobias Buchholz
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419114#comment-13419114 ] Markus Jelsma commented on SOLR-1781: - The node will never respond to HTTP requests, all ZK connections time out, very high resource consumption. I'll try provide a log snippet soon. I tried running today's build several times but one specific node refuses to `come online`. Another node did well and runs today's build. I cannot attach a file to a resolved issue. Send over mail? Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-1781: --- Ill reopen - email is fine as well. Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419119#comment-13419119 ] Michael McCandless commented on LUCENE-4227: {quote} Would it really be that much slower if it was slightly more reasonable, e.g. storing freqs in packed ints (with huper-duper fast options) instead of wasting so much on them? {quote} Probably not that much slower? I think that's a good idea! But I think we can explore this after committing? There are other things we can try too (eg collapse skip list into shared int[]: I think this one may give a perf gain, collapse positions, etc.). DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4109) BooleanQueries are not parsed correctly with the flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419122#comment-13419122 ] Robert Muir commented on LUCENE-4109: - Hmm, with the patch some tests for TestMultiFieldQPHelper fail. I didn't look into it further, but we should figure out whats going on there if we can. BooleanQueries are not parsed correctly with the flexible query parser -- Key: LUCENE-4109 URL: https://issues.apache.org/jira/browse/LUCENE-4109 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.5, 3.6 Reporter: Daniel Truemper Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4109.patch, LUCENE-4109.patch, test-patch.txt Hi, I just found another bug in the flexible query parser (together with Robert Muir, yay!). The following query string works in the standard query parser: {noformat} (field:[1 TO *] AND field:[* TO 2]) AND field2:z {noformat} yields {noformat} +(+field:[1 TO *] +field:[* TO 2]) +field2:z {noformat} The flexible query parser though yields: {noformat} +(field:[1 TO *] field:[* TO 2]) +field2:z {noformat} Test patch is attached (from Robert actually). I don't know if it affects earlier versions than 3.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419126#comment-13419126 ] Robert Muir commented on LUCENE-4227: - Yeah, i don't think we need to solve it before committing. I do think maybe this class needs some more warnings, to me it seems it will use crazy amounts of RAM. I also am not sure I like the name Direct... is it crazy to suggest Instantiated? DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419127#comment-13419127 ] Markus Jelsma commented on SOLR-1781: - Log sent. This node has two shards on it and executed 2x 512 warmup queries which adds up. It won't talk to ZK (tail of the log). Restarting the node with an 18th's build works fine. Did it three times today. Thanks Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene on iOS
Yes okay, I was hoping there could be a similar problem at the java lucene library with inconsistent search times, so the solution for that could help me as well. On Fri, Jul 20, 2012 at 3:03 PM, Jan Høydahl jan@cominvent.com wrote: Hi, This mailing list is only for the main Java based Lucene library. Please ask your question to S4LuceneLibrary directly, which seems to be a completely independent port. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. juli 2012, at 16:14, Tobias Buchholz wrote: Hello, I'm Tobias Buchholz and a student at HTW in Berlin. For my bachelor thesis I'm trying to improve a search algorithm of an iOS app, which offers some magazines with a lot of articles. To do that I used the S4LuceneLibrary of Micheal Papp (https://github.com/mikekppp/S4LuceneLibrary), which is an iOS equivalent to the full-featured text search engine library of Apache Lucene. The problem is, that the search now is very inconsistent...that means the search after specific words takes sometimes very long and on the other hand sometimes not. That's a list of words, I was searching for, and the time the search took: - Berlin (34 hits) - 2,8 seconds - Tag (29 hits) - 11,8 seconds - Haus (3 hits) - 7,1 seconds - Straße (28 hits) - 15,7 seconds - Raumfahrt (5 hits) - 13,8 seconds - Astronomie (9 hits) - 6 second So the results are quite different, but I thought it should take the same time for every search phrase. Do you have an idea why is that? Thanks in advance! Best Regards, Tobias Buchholz
[jira] [Commented] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419129#comment-13419129 ] Michael McCandless commented on LUCENE-4227: bq. I do think maybe this class needs some more warnings, to me it seems it will use crazy amounts of RAM. I'll add some scary warnings :) bq. I also am not sure I like the name Direct... is it crazy to suggest Instantiated? It is very much like the old instantiated (though I think its terms dict is faster than instantiated's)... but I didn't really like the name Instanstiated... I had picked Direct because it directly represents the postings ... but maybe we can find a better name. I will update MIGRATE.txt to explain how Direct (or whatever we name it) is the closest match if you were previously using Instantiated... DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419131#comment-13419131 ] Robert Muir commented on LUCENE-4227: - {quote} It is very much like the old instantiated (though I think its terms dict is faster than instantiated's)... but I didn't really like the name Instanstiated... I had picked Direct because it directly represents the postings ... but maybe we can find a better name. {quote} OK, I think what would be better is a better synonym for Uncompressed. I realized Direct is consistent with packedints or whatever... but I don't think it should using this name either, its not intuitive. DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4224) Simplify MultiValuedCase in TermsIncludingScoreQuery
[ https://issues.apache.org/jira/browse/LUCENE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419101#comment-13419101 ] Martijn van Groningen edited comment on LUCENE-4224 at 7/20/12 1:46 PM: Attached a new patch. * Added a Scorer that scores in order. * Existing scorer throws an UOE in the advance() method. was (Author: martijn.v.groningen): Attached a new patch. * Added a Scorer that scores in order. * Existing throw a UOE in the advance() method. Simplify MultiValuedCase in TermsIncludingScoreQuery Key: LUCENE-4224 URL: https://issues.apache.org/jira/browse/LUCENE-4224 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Martijn van Groningen Attachments: LUCENE-4224.patch, LUCENE-4224.patch While looking at LUCENE-4214, i was trying to wrap my head around what this is doing... I think the code specialization in the multivalued scorer doesn't buy us any additional speed? At least according to my benchmarks? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3892: -- Attachment: LUCENE-3892-blockFor-with-packedints.patch An initial try with PackedInts in current trunk version. I replaced all the int[] buffer with long[] buffer so we can use the API directly. I don't quite understand the Writer part, so we have to save each long value one by one. However, it is the Reader part we are concerned: {format} TaskQPS base StdDev base QPS packedStdDev packed Pct diff AndHighHigh 29.601.56 23.780.51 -25% - -13% AndHighMed 74.683.92 53.152.31 -35% - -21% Fuzzy1 88.231.21 87.131.41 -4% - 1% Fuzzy2 30.090.45 29.470.47 -5% - 1% IntNRQ 41.963.88 38.162.48 -22% - 6% OrHighHigh 17.560.34 15.450.15 -14% - -9% OrHighMed 34.710.76 30.770.53 -14% - -7% PKLookup 111.001.90 110.521.59 -3% - 2% Phrase9.030.237.620.41 -22% - -8% Prefix3 123.568.42 110.945.43 -20% - 1% Respell 102.371.11 101.791.38 -2% - 1% SloppyPhrase3.970.193.520.07 -17% - -4% SpanNear8.240.187.220.25 -17% - -7% Term 45.163.15 37.472.32 -27% - -5% TermBGroup1M 17.191.09 15.860.77 -17% - 3% TermBGroup1M1P 23.471.66 20.431.16 -23% - -1% TermGroup1M 19.201.14 17.730.84 -16% - 2% Wildcard 42.753.27 36.751.96 -24% - -1% {format} Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits, instead of using PACKED all the time? Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now. Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch On the flex branch we explored a number of possible intblock encodings, but for whatever reason never brought them to completion. There are still a number of issues opened with patches in different states. Initial results (based on prototype) were excellent (see http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html ). I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419144#comment-13419144 ] Han Jiang edited comment on LUCENE-3892 at 7/20/12 1:52 PM: An initial try with PackedInts in current trunk version. I replaced all the int[] buffer with long[] buffer so we can use the API directly. I don't quite understand the Writer part, so we have to save each long value one by one. However, it is the Reader part we are concerned: {noformat} TaskQPS base StdDev base QPS packedStdDev packed Pct diff AndHighHigh 29.601.56 23.780.51 -25% - -13% AndHighMed 74.683.92 53.152.31 -35% - -21% Fuzzy1 88.231.21 87.131.41 -4% - 1% Fuzzy2 30.090.45 29.470.47 -5% - 1% IntNRQ 41.963.88 38.162.48 -22% - 6% OrHighHigh 17.560.34 15.450.15 -14% - -9% OrHighMed 34.710.76 30.770.53 -14% - -7% PKLookup 111.001.90 110.521.59 -3% - 2% Phrase9.030.237.620.41 -22% - -8% Prefix3 123.568.42 110.945.43 -20% - 1% Respell 102.371.11 101.791.38 -2% - 1% SloppyPhrase3.970.193.520.07 -17% - -4% SpanNear8.240.187.220.25 -17% - -7% Term 45.163.15 37.472.32 -27% - -5% TermBGroup1M 17.191.09 15.860.77 -17% - 3% TermBGroup1M1P 23.471.66 20.431.16 -23% - -1% TermGroup1M 19.201.14 17.730.84 -16% - 2% Wildcard 42.753.27 36.751.96 -24% - -1% {noformat} Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits, instead of using PACKED all the time? Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now. was (Author: billy): An initial try with PackedInts in current trunk version. I replaced all the int[] buffer with long[] buffer so we can use the API directly. I don't quite understand the Writer part, so we have to save each long value one by one. However, it is the Reader part we are concerned: {format} TaskQPS base StdDev base QPS packedStdDev packed Pct diff AndHighHigh 29.601.56 23.780.51 -25% - -13% AndHighMed 74.683.92 53.152.31 -35% - -21% Fuzzy1 88.231.21 87.131.41 -4% - 1% Fuzzy2 30.090.45 29.470.47 -5% - 1% IntNRQ 41.963.88 38.162.48 -22% - 6% OrHighHigh 17.560.34 15.450.15 -14% - -9% OrHighMed 34.710.76 30.770.53 -14% - -7% PKLookup 111.001.90 110.521.59 -3% - 2% Phrase9.030.237.620.41 -22% - -8% Prefix3 123.568.42 110.945.43 -20% - 1% Respell 102.371.11 101.791.38 -2% - 1% SloppyPhrase3.970.193.520.07 -17% - -4% SpanNear8.240.187.220.25 -17% - -7% Term 45.163.15 37.472.32 -27% - -5% TermBGroup1M 17.191.09 15.860.77 -17% - 3% TermBGroup1M1P 23.471.66 20.431.16 -23% - -1% TermGroup1M 19.201.14 17.730.84 -16% - 2% Wildcard 42.753.27 36.751.96 -24% - -1% {format} Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits, instead of using PACKED all the time? Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now. Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12
[jira] [Created] (LUCENE-4240) Analyzer.getOffsetGap Improvements
Grant Ingersoll created LUCENE-4240: --- Summary: Analyzer.getOffsetGap Improvements Key: LUCENE-4240 URL: https://issues.apache.org/jira/browse/LUCENE-4240 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll From LUCENE-3151 (Robert Muir's comments): there is no need for the Analyzer to take in an IndexableField object. We can simplify this API: {quote} Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3151) Make all of Analysis completely independent from Lucene Core
[ https://issues.apache.org/jira/browse/LUCENE-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419163#comment-13419163 ] Grant Ingersoll commented on LUCENE-3151: - LUCENE-4240 Make all of Analysis completely independent from Lucene Core Key: LUCENE-3151 URL: https://issues.apache.org/jira/browse/LUCENE-3151 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Fix For: 4.1 Attachments: LUCENE-3151.patch, LUCENE-3151.patch Lucene's analysis package, including the definitions of Attribute, TokenStream, etc. are quite useful outside of Lucene (for instance, Mahout uses them) for text processing. I'd like to move the definitions, or at least their packaging, to a separate JAR file so that one can consume them w/o needing Lucene core. My draft idea is to have a definition area that Lucene core is dependent on and the rest of the analysis package can then be dependent on the definition area. (I'm open to other ideas as well) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4240) Analyzer.getOffsetGap Improvements
[ https://issues.apache.org/jira/browse/LUCENE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419165#comment-13419165 ] Uwe Schindler commented on LUCENE-4240: --- +1, nice simplification. I was always wondering about this inconsistency. String field is enough. Analyzer.getOffsetGap Improvements -- Key: LUCENE-4240 URL: https://issues.apache.org/jira/browse/LUCENE-4240 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll From LUCENE-3151 (Robert Muir's comments): there is no need for the Analyzer to take in an IndexableField object. We can simplify this API: {quote} Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4227: --- Attachment: LUCENE-4227.patch New patch, adding scary warning MIGRATE.txt entry, fixing javadoc errors, and adding lucene.experimental ... still haven't thought of another name yet ... DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419173#comment-13419173 ] Robert Muir commented on LUCENE-4227: - I dont have better name either. Lets just commit it with this one and think about it for later! DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4227.patch, LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4238) NRTCachingDirectory has concurrency bug(s).
[ https://issues.apache.org/jira/browse/LUCENE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419174#comment-13419174 ] Michael McCandless commented on LUCENE-4238: Hi Mark, which test/seed are you seeing this on? NRTCachingDirectory has concurrency bug(s). --- Key: LUCENE-4238 URL: https://issues.apache.org/jira/browse/LUCENE-4238 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Mark Miller Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: top level ant test shouldn't validate?
That makes two of us. I'm gonna disable this as it just broke my IDE just now again and I'm over the edge! On Thu, Jul 19, 2012 at 8:56 PM, Mark Miller markrmil...@gmail.com wrote: Top level any clean breaks my IDE too! I don't know the fine points of this conversation, but it's super painful and I never call top level ant clean anymore. I kept meaning to look into why it was killing me but never got to it. Sent from my iPhone On Jul 19, 2012, at 12:46 PM, Robert Muir rcm...@gmail.com wrote: +1, we have caged the rat, we should be able to have a simple precommit check. also top-level 'ant clean' shouldn't call clean-jars. This *totally messes up* my IDE just because I like to run tests from the command-line. On Thu, Jul 19, 2012 at 12:40 PM, Steven A Rowe sar...@syr.edu wrote: On 7/19/2012 at 12:35 PM, Michael McCandless wrote: Any objections to fixing top level ant test to simply run tests...? Maybe we can add a precommit target to run tests, validate, javadocs-lint, ... +1 Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4227) DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM
[ https://issues.apache.org/jira/browse/LUCENE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-4227. Resolution: Fixed Fix Version/s: 5.0 4.0 DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM - Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0, 5.0 Attachments: LUCENE-4227.patch, LUCENE-4227.patch, LUCENE-4227.patch This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq = 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4240) Analyzer.getOffsetGap Improvements
[ https://issues.apache.org/jira/browse/LUCENE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4240: Attachment: LUCENE-4240.patch initial patch Analyzer.getOffsetGap Improvements -- Key: LUCENE-4240 URL: https://issues.apache.org/jira/browse/LUCENE-4240 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Attachments: LUCENE-4240.patch From LUCENE-3151 (Robert Muir's comments): there is no need for the Analyzer to take in an IndexableField object. We can simplify this API: {quote} Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4240) Analyzer.getOffsetGap Improvements
[ https://issues.apache.org/jira/browse/LUCENE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419186#comment-13419186 ] Michael McCandless commented on LUCENE-4240: +1 Analyzer.getOffsetGap Improvements -- Key: LUCENE-4240 URL: https://issues.apache.org/jira/browse/LUCENE-4240 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Attachments: LUCENE-4240.patch From LUCENE-3151 (Robert Muir's comments): there is no need for the Analyzer to take in an IndexableField object. We can simplify this API: {quote} Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4240) Analyzer.getOffsetGap Improvements
[ https://issues.apache.org/jira/browse/LUCENE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419191#comment-13419191 ] Chris Male commented on LUCENE-4240: +1 Analyzer.getOffsetGap Improvements -- Key: LUCENE-4240 URL: https://issues.apache.org/jira/browse/LUCENE-4240 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Attachments: LUCENE-4240.patch From LUCENE-3151 (Robert Muir's comments): there is no need for the Analyzer to take in an IndexableField object. We can simplify this API: {quote} Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches
[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-4069: - Attachment: 4069Failure.zip Attached a log of thread activity showing how TestIndexWriterCommit.testCommitThreadSafety() is failing. At this stage I can't tell if this is a failing in MockDirectoryWrapper or the test or the BloomPF class but it is related to files being removed unexpectedly. Segment-level Bloom filters for a 2 x speed up on rare term searches Key: LUCENE-4069 URL: https://issues.apache.org/jira/browse/LUCENE-4069 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.6, 4.0-ALPHA Reporter: Mark Harwood Priority: Minor Fix For: 4.0 Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java An addition to each segment which stores a Bloom filter for selected fields in order to give fast-fail to term searches, helping avoid wasted disk access. Best suited for low-frequency fields e.g. primary keys on big indexes with many segments but also speeds up general searching in my tests. Overview slideshow here: http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU Patch based on 3.6 codebase attached. There are no 3.6 API changes currently - to play just add a field with _blm on the end of the name to invoke special indexing/querying capability. Clearly a new Field or schema declaration(!) would need adding to APIs to configure the service properly. Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4109) BooleanQueries are not parsed correctly with the flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419042#comment-13419042 ] Karsten R. edited comment on LUCENE-4109 at 7/20/12 3:30 PM: - Patch for lucene/contrib against http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6 The patch adds the Processor BooleanQuery2ModifierNodeProcessor. The patch also changes ParametricRangeQueryNodeProcessor as hotfix for LUCENE-3338 (this change is not for 4.X because LUCENE-3338 is already fixed in 4.X). The patch passes most tests from QueryParserTestBase e.g. except {{{assertQueryEquals([\\* TO \*\],null,[\\* TO \\*]);}}} and LUCENE-2566 related tests. Patch for trunk will coming soon. was (Author: karsten-solr): Patch for lucene/contrib against http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6 The patch adds the Processor BooleanQuery2ModifierNodeProcessor. The patch also changes ParametricRangeQueryNodeProcessor as hotfix for LUCENE-3338 (this change is not for 4.X because LUCENE-3338 is already fixed in 4.X). The patch passes all tests from QueryParserTestBase except {{{assertQueryEquals([\\* TO \*\],null,[\\* TO \\*]);}}} and LUCENE-2566 related tests. Patch for trunk will coming soon. BooleanQueries are not parsed correctly with the flexible query parser -- Key: LUCENE-4109 URL: https://issues.apache.org/jira/browse/LUCENE-4109 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.5, 3.6 Reporter: Daniel Truemper Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4109.patch, LUCENE-4109.patch, test-patch.txt Hi, I just found another bug in the flexible query parser (together with Robert Muir, yay!). The following query string works in the standard query parser: {noformat} (field:[1 TO *] AND field:[* TO 2]) AND field2:z {noformat} yields {noformat} +(+field:[1 TO *] +field:[* TO 2]) +field2:z {noformat} The flexible query parser though yields: {noformat} +(field:[1 TO *] field:[* TO 2]) +field2:z {noformat} Test patch is attached (from Robert actually). I don't know if it affects earlier versions than 3.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4240) Analyzer.getOffsetGap Improvements
[ https://issues.apache.org/jira/browse/LUCENE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4240. - Resolution: Fixed Fix Version/s: 5.0 4.0 Analyzer.getOffsetGap Improvements -- Key: LUCENE-4240 URL: https://issues.apache.org/jira/browse/LUCENE-4240 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Fix For: 4.0, 5.0 Attachments: LUCENE-4240.patch From LUCENE-3151 (Robert Muir's comments): there is no need for the Analyzer to take in an IndexableField object. We can simplify this API: {quote} Hey Grant: I know it sounds silly but can we split out the getOffsetGap API change into a separate issue? This would be nice to fix ASAP. I dont understand why it takes IndexableField or took Fieldable. All the other methods here like getPositionIncrementGap take String fieldName. I think this one should too. I dont think it needs a boolean for tokenized either: returning a 0 for NOT_ANALYZED fields. If you choose NOT_ANALYZED, that should mean the Analyzer is not invoked! If you want to do expert stuff control the offset gaps between values for NOT_ANALYZED fields, then just analyze it instead, with keyword tokenizer! {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4239) Provide access to PackedInts' low-level blocks - values conversion methods
[ https://issues.apache.org/jira/browse/LUCENE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419283#comment-13419283 ] Han Jiang commented on LUCENE-4239: --- Thank you Adrien! We'll work easier with this Decoder/Encoder interface. However, This patch isn't passing ant-compile under latest trunk, seems that encoder/decoder methods for Packed64SingleBlockBulkOperation32 are missing? Anyway, we're not using docId up to 32 bits currently, I'll test the performance later. Since we have to handle IndexInput/Output at upper level, we prefer to use direct int[] rather than IntBuffer. Actually, we had a patch making PackedIntsDecompress handle int array instead: https://issues.apache.org/jira/secure/attachment/12532888/LUCENE-3892_for_int%5B%5D.patch (the file name was ForDecompressImpl.java). Performance test shows little difference between these two versions, but as int[] is clear and simple, I think that should be what we hope to use. So... maybe you can provide us methods like: encode(int[] values, long[] blocks, int iterations), decode(long[] blocks, int[] values, int iterations)? Provide access to PackedInts' low-level blocks - values conversion methods Key: LUCENE-4239 URL: https://issues.apache.org/jira/browse/LUCENE-4239 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.0 Attachments: LUCENE-4239.patch In LUCENE-4161 we started to make the {{PackedInts}} API more flexible so that codecs could use it whenever they need to (un)pack integers. There are two posting formats in progress (For and PFor, LUCENE-3892) that perform a lot of integer (un)packing but the current API still has limits : - it only works with long[] arrays, whereas these codecs need to manipulate int[] arrays, - the packed reader iterators work great for unpacking long sequences of integers, but they would probably cause a lot of overhead to decode lots of short integer sequences such as the ones that can be generated by For and PFor. I've been looking at the For/PFor branch and it has a {{PackedIntsDecompress}} class (http://svn.apache.org/repos/asf/lucene/dev/branches/pforcodec_3892/lucene/core/src/java/org/apache/lucene/codecs/pfor/PackedIntsDecompress.java) which is very similar to {{oal.util.packed.BulkOperation}} (package-private), so maybe we should find a way to expose this class so that the For/PFor branch can directly use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4109) BooleanQueries are not parsed correctly with the flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419311#comment-13419311 ] Karsten R. commented on LUCENE-4109: Robert, I forgot to run all tests :-( The patch must also include MultiFieldQueryNodeProcessor ({{new OrQueryNode(children)}} instead of {{new BooleanQueryNode(children)}}) and PrecedenceQueryNodeProcessorPipeline ({{BooleanQuery2ModifierNodeProcessor.class}} instead of {{GroupQueryNodeProcessor.class}}). I will fix this on monday. btw. I hope {{((b:one +b:more) t:two)}} is equal to {{((b:one +b:more) (+t:two))}} BooleanQueries are not parsed correctly with the flexible query parser -- Key: LUCENE-4109 URL: https://issues.apache.org/jira/browse/LUCENE-4109 Project: Lucene - Java Issue Type: Bug Components: modules/queryparser Affects Versions: 3.5, 3.6 Reporter: Daniel Truemper Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4109.patch, LUCENE-4109.patch, test-patch.txt Hi, I just found another bug in the flexible query parser (together with Robert Muir, yay!). The following query string works in the standard query parser: {noformat} (field:[1 TO *] AND field:[* TO 2]) AND field2:z {noformat} yields {noformat} +(+field:[1 TO *] +field:[* TO 2]) +field2:z {noformat} The flexible query parser though yields: {noformat} +(field:[1 TO *] field:[* TO 2]) +field2:z {noformat} Test patch is attached (from Robert actually). I don't know if it affects earlier versions than 3.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3655) A starting replica can briefly appear active after Solr starts and before recovery begins.
Mark Miller created SOLR-3655: - Summary: A starting replica can briefly appear active after Solr starts and before recovery begins. Key: SOLR-3655 URL: https://issues.apache.org/jira/browse/SOLR-3655 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
[ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419326#comment-13419326 ] Lance Norskog commented on SOLR-3653: - bq. Actually there are factories in contrib/analysis-extras. You're right, I was thinking of a previous project. bq. I am not sure on this: if someone wants to mix an n-gram technique with a word model, they can just use two fields? If they want to limit the n-gram field to only longer terms, they should use LengthFilter. Is this the design? {code} Word-based field: SmartChineseWordTokenFilter - LengthFilter accept 1-3 letters Bigram-based field: SmartChineseWordTokenFilter - LengthFilter accept 4 or longer - Chinese-only bigrams {code} This works if the user searches simple words, like on a consumer site. In the legal document site, people block-copy 60-word document titles and expect to find the matching title first on the list. This requires a phrase search where 0 variations in position gives the exact title. If the two classes of terms are in two different fields, will that work? I did not think parsers did Also, this design needs to allow for mixed language text: year numbers, English words. Are the existing Lucene filters flexible enough to do this? bq. The word you are upset about (中华人民共和国) is in the smartcn dictionary. As I understand, this word basically means PRC. This is a single concept and makes sense as an indexing unit. Why do we care how long it is in characters? Because parts of it are also words, which should be searchable. Here are two more failed words: 个人所得税 (personal/individual income tax) and 社会保险 (National Congress, political body). I can imagine Congress would be in the dictionary, but personal income tax? If you search for income tax: 所得税 you will not find personal income tax. This points up a flaw: the bigram trick will not find this trigram. How do you know what's in the dictionary? The files are in a .mem format. I can't find a main program for them. Support Smart Simplified Chinese in Solr - include clean-up bigramming filter - Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog Attachments: SOLR-3653.patch, SmartChineseType.pdf The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3653) Support Smart Simplified Chinese in Solr - include clean-up bigramming filter
[ https://issues.apache.org/jira/browse/SOLR-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419330#comment-13419330 ] Robert Muir commented on SOLR-3653: --- {quote} Because parts of it are also words, which should be searchable. {quote} Says who? There is no real word boundaries in this language. If you want to start indexing individual characters, just use StandardTokenizer. None of your examples are failures of this tokenizer. This is what it has in its dictionary! Support Smart Simplified Chinese in Solr - include clean-up bigramming filter - Key: SOLR-3653 URL: https://issues.apache.org/jira/browse/SOLR-3653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Lance Norskog Attachments: SOLR-3653.patch, SmartChineseType.pdf The Smart Simplified Chinese toolkit in lucene/analysis/smartcn has no Solr factories. Also, since it is a statistical algorithm, it is not perfect. This patch supplies factories and a schema.xml type for the existing Lucene Smart Chinese implementation, and includes a fixup class to handle the occasional mistake made by the Smart Chinese implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4239) Provide access to PackedInts' low-level blocks - values conversion methods
[ https://issues.apache.org/jira/browse/LUCENE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419283#comment-13419283 ] Han Jiang edited comment on LUCENE-4239 at 7/20/12 4:49 PM: Thank you Adrien! We'll work easier with this Decoder/Encoder interface. However, This patch isn't passing ant-compile under latest trunk, seems that encoder/decoder methods for Packed64SingleBlockBulkOperation32 are missing? Anyway, we're not using docId up to 32 bits currently, I'll test the performance later. * We're still using IntBuffer just because IndexInput/Ouput don't provide a read/writeInts() method :). Since we still have to handle IndexInput/Output at upper level, we prefer to use direct int[] rather than IntBuffer now. Actually, we had a patch making PackedIntsDecompress handle int[] instead, you can have a glance at it: http://pastebin.com/euvtBD8P. Performance test show little difference between these two versions, and we should choose a clean simple impl right? * As for PFor, we may have to encode another small block of ints with packed format when blockSize128 and blockSize%32 != 0. Current impl will use numBits=8,16,32 to simplify decoder. However, we may consider to use other numBits in near future, I'm afraid this will be a bottleneck when decoder is not hardcoded. So... as a second shot, maybe you can provide us methods like: encode(int[] values, long[] blocks, int iterations), decode(long[] blocks, int[] values, int iterations)? was (Author: billy): Thank you Adrien! We'll work easier with this Decoder/Encoder interface. However, This patch isn't passing ant-compile under latest trunk, seems that encoder/decoder methods for Packed64SingleBlockBulkOperation32 are missing? Anyway, we're not using docId up to 32 bits currently, I'll test the performance later. Since we have to handle IndexInput/Output at upper level, we prefer to use direct int[] rather than IntBuffer. Actually, we had a patch making PackedIntsDecompress handle int array instead: https://issues.apache.org/jira/secure/attachment/12532888/LUCENE-3892_for_int%5B%5D.patch (the file name was ForDecompressImpl.java). Performance test shows little difference between these two versions, but as int[] is clear and simple, I think that should be what we hope to use. So... maybe you can provide us methods like: encode(int[] values, long[] blocks, int iterations), decode(long[] blocks, int[] values, int iterations)? Provide access to PackedInts' low-level blocks - values conversion methods Key: LUCENE-4239 URL: https://issues.apache.org/jira/browse/LUCENE-4239 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.0 Attachments: LUCENE-4239.patch In LUCENE-4161 we started to make the {{PackedInts}} API more flexible so that codecs could use it whenever they need to (un)pack integers. There are two posting formats in progress (For and PFor, LUCENE-3892) that perform a lot of integer (un)packing but the current API still has limits : - it only works with long[] arrays, whereas these codecs need to manipulate int[] arrays, - the packed reader iterators work great for unpacking long sequences of integers, but they would probably cause a lot of overhead to decode lots of short integer sequences such as the ones that can be generated by For and PFor. I've been looking at the For/PFor branch and it has a {{PackedIntsDecompress}} class (http://svn.apache.org/repos/asf/lucene/dev/branches/pforcodec_3892/lucene/core/src/java/org/apache/lucene/codecs/pfor/PackedIntsDecompress.java) which is very similar to {{oal.util.packed.BulkOperation}} (package-private), so maybe we should find a way to expose this class so that the For/PFor branch can directly use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4237) add ant task to generate optionally ALL javadocs
[ https://issues.apache.org/jira/browse/LUCENE-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4237. - Resolution: Fixed Fix Version/s: (was: 4.0-ALPHA) 5.0 4.0 add ant task to generate optionally ALL javadocs Key: LUCENE-4237 URL: https://issues.apache.org/jira/browse/LUCENE-4237 Project: Lucene - Java Issue Type: Improvement Components: general/javadocs Reporter: Bernd Fehling Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4237.patch As of jira LUCENE-3977 the generation of javadocs has been cleaned up and is now set fix to 'noindex' to keep distributions small. An ant task should make this selectable to have the option for really building ALL javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3655) A starting replica can briefly appear active after Solr starts and before recovery begins.
[ https://issues.apache.org/jira/browse/SOLR-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419353#comment-13419353 ] Mark Miller commented on SOLR-3655: --- Hmm..almost looks like i thought this mostly because of a ui bug - i think perhaps its showing green for a moment when it should not. When i try and check the same thing through the zk tree, it looks right. I did tighten things so that the leader for sure sees a down state before the replica registers its live node though. A starting replica can briefly appear active after Solr starts and before recovery begins. -- Key: SOLR-3655 URL: https://issues.apache.org/jira/browse/SOLR-3655 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3656) use same data dir in core reload
Yonik Seeley created SOLR-3656: -- Summary: use same data dir in core reload Key: SOLR-3656 URL: https://issues.apache.org/jira/browse/SOLR-3656 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Priority: Minor When a core reload is issued, we should use the same data dir. This causes problems for things like our test framework that reload the core and end up with the data dir in a different place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1363272 - /lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
Hi Martinj: thanks for looking into this! I think have a better fix for these: the problem is actually in the AssertingAtomicReaders that AssertingDirectoryReader wraps its subreaders with. So I added the invisible-cache-key hack there, and removed it completely from LuceneTestCase. I tested this with the hudson seeds that failed (at their appropriate revisions) and it seems to work fine. I also ran tests for queries/grouping/join with -Dnightly=true, -Dtests.multiplier=5, etc etc a few times and it all works. I'd really like to have AssertingDirectoryReader being used again. If there are problems we can just back out the change. On Thu, Jul 19, 2012 at 5:48 AM, m...@apache.org wrote: Author: mvg Date: Thu Jul 19 09:48:04 2012 New Revision: 1363272 URL: http://svn.apache.org/viewvc?rev=1363272view=rev Log: Fix of rare FC insanity during tests that have occurred in grouping joining tests. Modified: lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java Modified: lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java?rev=1363272r1=1363271r2=1363272view=diff == --- lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java (original) +++ lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java Thu Jul 19 09:48:04 2012 @@ -1048,7 +1048,7 @@ public abstract class LuceneTestCase ext if (r instanceof AtomicReader) { r = new FCInvisibleMultiReader(new AssertingAtomicReader((AtomicReader)r)); } else if (r instanceof DirectoryReader) { - r = new FCInvisibleMultiReader(new AssertingDirectoryReader((DirectoryReader)r)); + r = new FCInvisibleMultiReader((DirectoryReader)r); } break; default: -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3892: -- Attachment: LUCENE-3892-blockFor-with-packedints-decoder.patch Patch with the decoder interface, mentioned in LUCENE-4239. I'm afraid that the for loop of readLong() hurts the performance. Here is the comparison against last patch: {noformat} TaskQPS base StdDev baseQPS comp StdDev comp Pct diff AndHighHigh 21.890.64 22.140.43 -3% - 6% AndHighMed 52.232.34 52.941.74 -6% - 9% Fuzzy1 86.611.63 87.293.14 -4% - 6% Fuzzy2 30.540.54 30.951.18 -4% - 7% IntNRQ 38.001.23 38.141.04 -5% - 6% OrHighHigh 16.370.21 16.680.79 -4% - 8% OrHighMed 39.590.69 40.342.16 -5% - 9% PKLookup 111.511.34 112.781.37 -1% - 3% Phrase4.540.124.520.13 -5% - 5% Prefix3 107.852.51 109.132.10 -3% - 5% Respell 123.212.18 125.155.01 -4% - 7% SloppyPhrase6.510.116.440.29 -7% - 5% SpanNear5.360.165.310.14 -6% - 4% Term 42.491.66 44.101.86 -4% - 12% TermBGroup1M 17.860.80 17.820.51 -7% - 7% TermBGroup1M1P 21.080.55 21.100.62 -5% - 5% TermGroup1M 19.570.82 19.570.64 -7% - 7% Wildcard 43.991.21 44.801.10 -3% - 7% {noformat} Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch On the flex branch we explored a number of possible intblock encodings, but for whatever reason never brought them to completion. There are still a number of issues opened with patches in different states. Initial results (based on prototype) were excellent (see http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html ). I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3657) error message only refers to source field when problem parsing value for dest field of copyField
Hoss Man created SOLR-3657: -- Summary: error message only refers to source field when problem parsing value for dest field of copyField Key: SOLR-3657 URL: https://issues.apache.org/jira/browse/SOLR-3657 Project: Solr Issue Type: Bug Reporter: Hoss Man When a client submits a document with a value that is copyFielded into a dest field where the value is not suitable (ie: something that is not a number copied into a numeric field) the error message only refers to the original source field name, not the dest field name. ideally it should mention both fields -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3657) error message only refers to source field when problem parsing value for dest field of copyField
[ https://issues.apache.org/jira/browse/SOLR-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419371#comment-13419371 ] Hoss Man commented on SOLR-3657: Info from solr-user... {noformat} schema.xml: types... fieldtype name=text_not_empty class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LengthFilterFactory min=1 max=20 / /analyzer /fieldtype /types fields... field name=estimated_hours type=tfloat indexed=true stored=true required=false / field name=s_estimated_hours type=text_not_empty indexed=false stored=false / /fields copyField source=s_estimated_hours dest=estimated_hours / ... WARNUNG: Error creating document : SolrInputDocument[{id=id(1.0)={2930}, s_estimated_hours=s_estimated_hours(1.0)={}}] org.apache.solr.common.SolrException: ERROR: [doc=2930] Error adding field 's_estimated_hours'='' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:992) at java.lang.Float.parseFloat(Float.java:422) at org.apache.solr.schema.TrieField.createField(TrieField.java:410) at org.apache.solr.schema.FieldType.createFields(FieldType.java:289) at org.apache.solr.schema.SchemaField.createFields(SchemaField.java:107) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:312) ... 11 more {noformat} My response... {quote} I believe this is intentional, but i can understand how it might be confusing. I think the point here is that since the field submitted by the client was named s_estimated_hours that's the field used in the error reported back to the client when something goes wrong with the copyField -- if the error message refered to estimated_hours the client may not have any idea why/where that field came from. But i can certainly understand the confusion, i've opened SOLR-3657 to try and improve on this. Ideally the error message should make it clear that the value from source field was copied to dest field which then encountered error {quote} error message only refers to source field when problem parsing value for dest field of copyField Key: SOLR-3657 URL: https://issues.apache.org/jira/browse/SOLR-3657 Project: Solr Issue Type: Bug Reporter: Hoss Man When a client submits a document with a value that is copyFielded into a dest field where the value is not suitable (ie: something that is not a number copied into a numeric field) the error message only refers to the original source field name, not the dest field name. ideally it should mention both fields -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3656) use same data dir in core reload
[ https://issues.apache.org/jira/browse/SOLR-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-3656: --- Attachment: SOLR-3656.patch Simple patch that just passes the directory of the current core when creating the new core. All tests pass. use same data dir in core reload Key: SOLR-3656 URL: https://issues.apache.org/jira/browse/SOLR-3656 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Priority: Minor Attachments: SOLR-3656.patch When a core reload is issued, we should use the same data dir. This causes problems for things like our test framework that reload the core and end up with the data dir in a different place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419380#comment-13419380 ] Mark Miller commented on SOLR-1781: --- Hmm...i can't replicate this issue so far. Another change around then was updating to ZooKeeper 3.3.5 (bug fix update). I wouldnt expect that to be an issue - but are you just upgrading one node and not all of them? Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419381#comment-13419381 ] Robert Muir commented on LUCENE-3892: - {quote} I'm afraid that the for loop of readLong() hurts the performance. Here is the comparison against last patch: {quote} I think so too. I think in each enum, up front you want a pre-allocated byte[] (maximum size possible for the block), and you do ByteBuffer.wrap(x).asLongBuffer. after you read the header, call readBytes() and then just rewind()? So this is just like what you do now in the branch, except with LongBuffer instead of IntBuffer Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch On the flex branch we explored a number of possible intblock encodings, but for whatever reason never brought them to completion. There are still a number of issues opened with patches in different states. Initial results (based on prototype) were excellent (see http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html ). I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419385#comment-13419385 ] Markus Jelsma commented on SOLR-1781: - Strange indeed. I can/could replicate it on one machine consistently and not on others. Machines weren't upgraded at the same time to prevent cluster downtime. I'll check back monday, there are two other machines left to upgrade plus the bad node. Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2115) DataImportHandler config file *must* be specified in defaults or status will be DataImportHandler started. Not Initialized. No commands can be run
[ https://issues.apache.org/jira/browse/SOLR-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2115: - Description: The DataImportHandler has two URL parameters for defining the data-config.xml file to be used for the command. 'config' is used in some places and 'dataConfig' is used in other places. 'config' does not work from an HTTP request. However, if it is in the defaults section of the DIH requestHandler definition, it works. If the 'config' parameter is used in an HTTP request, the DIH uses the default in the requestHandler anyway. This is the exception stack recieved by the client if there is no default. (This is the 3.X branch.) html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prenull java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:146) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) ..etc.. was: The DataImportHandler has two URL parameters for defining the data-config.xml file to be used for the command. 'config' is used in some places and 'dataConfig' is used in other places. 'config' does not work from an HTTP request. However, if it is in the defaults section of the DIH requestHandler definition, it works. If the 'config' parameter is used in an HTTP request, the DIH uses the default in the requestHandler anyway. This is the exception stack recieved by the client if there is no default. (This is the 3.X branch.) html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prenull java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:146) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) /pre pRequestURI=/solr/db/dataimport/ppismalla href=http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/
[jira] [Commented] (LUCENE-4239) Provide access to PackedInts' low-level blocks - values conversion methods
[ https://issues.apache.org/jira/browse/LUCENE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419415#comment-13419415 ] Michael McCandless commented on LUCENE-4239: I think we should just commit this current patch onto the block PF branch (https://svn.apache.org/repos/asf/lucene/dev/branches/pforcodec_3892 )? Then we can iterate on it, from both ends... Provide access to PackedInts' low-level blocks - values conversion methods Key: LUCENE-4239 URL: https://issues.apache.org/jira/browse/LUCENE-4239 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.0 Attachments: LUCENE-4239.patch In LUCENE-4161 we started to make the {{PackedInts}} API more flexible so that codecs could use it whenever they need to (un)pack integers. There are two posting formats in progress (For and PFor, LUCENE-3892) that perform a lot of integer (un)packing but the current API still has limits : - it only works with long[] arrays, whereas these codecs need to manipulate int[] arrays, - the packed reader iterators work great for unpacking long sequences of integers, but they would probably cause a lot of overhead to decode lots of short integer sequences such as the ones that can be generated by For and PFor. I've been looking at the For/PFor branch and it has a {{PackedIntsDecompress}} class (http://svn.apache.org/repos/asf/lucene/dev/branches/pforcodec_3892/lucene/core/src/java/org/apache/lucene/codecs/pfor/PackedIntsDecompress.java) which is very similar to {{oal.util.packed.BulkOperation}} (package-private), so maybe we should find a way to expose this class so that the For/PFor branch can directly use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
Hoss Man created LUCENE-4241: Summary: non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
[ https://issues.apache.org/jira/browse/LUCENE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-4241: - Attachment: just-failures.txt RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz Full tests-failures.txt (compressed) and a summary file containing just the failure stack traces (no log output) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile - Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Attachments: RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz, just-failures.txt Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2115) DataImportHandler config file *must* be specified in defaults or status will be DataImportHandler started. Not Initialized. No commands can be run
[ https://issues.apache.org/jira/browse/SOLR-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2115: - Attachment: SOLR-2115.patch With this patch... - DIH attempts to reload the configuration every time a new import is started. This is slightly more overhead, but negligible compared with the time an import takes as a whole. - The config is not loaded on startup and there is no need to have a defaults / section or have the config declared in solrconfig.xml at all. Instead users have the option to specify the config file on the request with the config parameter. - The dataConfig parameter, which lets users include the entire configuration as a request parameter is now always supported (previously this was only supported in debug mode) -The reload-config command is still supported, which is useful for validating a new configuration file, or if you want to specify a file, load it, and not have it reloaded again on import. - Datasources can still be specified in solrconfig.xml. As before these must be specified in the defaults section of the handler in solrconfig.xml. However, these are not parsed until the main configuration is loaded. - If there is an xml mistake in the configuration a much more user-friendly message is given in xml format, not raw format as before. Users can fix the problem and reload-config. DataImportHandler config file *must* be specified in defaults or status will be DataImportHandler started. Not Initialized. No commands can be run -- Key: SOLR-2115 URL: https://issues.apache.org/jira/browse/SOLR-2115 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4.1, 1.4.2, 3.1, 4.0-ALPHA Reporter: Lance Norskog Assignee: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2115.patch The DataImportHandler has two URL parameters for defining the data-config.xml file to be used for the command. 'config' is used in some places and 'dataConfig' is used in other places. 'config' does not work from an HTTP request. However, if it is in the defaults section of the DIH requestHandler definition, it works. If the 'config' parameter is used in an HTTP request, the DIH uses the default in the requestHandler anyway. This is the exception stack recieved by the client if there is no default. (This is the 3.X branch.) html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prenull java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:146) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) ..etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1781) Replication index directories not always cleaned up
[ https://issues.apache.org/jira/browse/SOLR-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419427#comment-13419427 ] Mark Miller commented on SOLR-1781: --- bq. Machines weren't upgraded at the same time to prevent cluster downtime. Yeah, makes sense, just wasn't sure how you went about it. I'd expect a bugfix release of zookeeper to work no problem with the previous nodes, but it's the other variable I think. They recommend upgrading with rolling restarts, so it shouldn't be the problem... Replication index directories not always cleaned up --- Key: SOLR-1781 URL: https://issues.apache.org/jira/browse/SOLR-1781 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 1.4 Environment: Windows Server 2003 R2, Java 6b18 Reporter: Terje Sten Bjerkseth Assignee: Mark Miller Fix For: 4.0, 5.0 Attachments: 0001-Replication-does-not-always-clean-up-old-directories.patch, SOLR-1781.patch, SOLR-1781.patch We had the same problem as someone described in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c222a518d-ddf5-4fc8-a02a-74d4f232b...@snooth.com%3e. A partial copy of that message: We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2482) DataImportHandler; reload-config; response in case of failure further requests
[ https://issues.apache.org/jira/browse/SOLR-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419428#comment-13419428 ] James Dyer commented on SOLR-2482: -- See SOLR-2115 for a patch that solves both issues. DataImportHandler; reload-config; response in case of failure further requests Key: SOLR-2482 URL: https://issues.apache.org/jira/browse/SOLR-2482 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler, web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Attachments: reload-config-error.html Reloading while the config-file is valid is completely fine, but if the config is broken - the Response is plain HTML, containing the full stacktrace (see attachment). further requests contain a {{status}} Element with ??DataImportHandler started. Not Initialized. No commands can be run??, but respond with a HTTP-Status 200 OK :/ Would be nice, if: * the response in case of error could also be xml formatted * contain the exception message (in my case ??The end-tag for element type entity must end with a 'gt;' delimiter.??) in a seperate field * use a better/correct http-status for the latter mentioned requests, i would suggest {{503 Service Unavailable}} So we are able to display to error-message to the user, while the config gets broken - and for the further requests we could rely on the http-status and have no need to check the content of the xml-response. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
[ https://issues.apache.org/jira/browse/LUCENE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419429#comment-13419429 ] Mark Miller commented on LUCENE-4241: - Probably related to https://issues.apache.org/jira/browse/LUCENE-4238 I didn't see it in that test, but I've seen it. I have a small test case that demonstarts one of the problems. non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile - Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Attachments: RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz, just-failures.txt Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4239) Provide access to PackedInts' low-level blocks - values conversion methods
[ https://issues.apache.org/jira/browse/LUCENE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419430#comment-13419430 ] Han Jiang commented on LUCENE-4239: --- bq. I think we should just commit this current patch onto the block PF branch ... +1, but shall we wait Adrien to fix the missing methods first? Provide access to PackedInts' low-level blocks - values conversion methods Key: LUCENE-4239 URL: https://issues.apache.org/jira/browse/LUCENE-4239 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.0 Attachments: LUCENE-4239.patch In LUCENE-4161 we started to make the {{PackedInts}} API more flexible so that codecs could use it whenever they need to (un)pack integers. There are two posting formats in progress (For and PFor, LUCENE-3892) that perform a lot of integer (un)packing but the current API still has limits : - it only works with long[] arrays, whereas these codecs need to manipulate int[] arrays, - the packed reader iterators work great for unpacking long sequences of integers, but they would probably cause a lot of overhead to decode lots of short integer sequences such as the ones that can be generated by For and PFor. I've been looking at the For/PFor branch and it has a {{PackedIntsDecompress}} class (http://svn.apache.org/repos/asf/lucene/dev/branches/pforcodec_3892/lucene/core/src/java/org/apache/lucene/codecs/pfor/PackedIntsDecompress.java) which is very similar to {{oal.util.packed.BulkOperation}} (package-private), so maybe we should find a way to expose this class so that the For/PFor branch can directly use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
[ https://issues.apache.org/jira/browse/LUCENE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419438#comment-13419438 ] Robert Muir commented on LUCENE-4241: - Don't you think the problem is likely that solr's replication doesnt use the Directory api, instead works on Files directly, and its accessing NRTCachingDir's delegate FSDir, then modifying files in that directory all underneath nrtcachingdir? non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile - Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Attachments: RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz, just-failures.txt Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4238) NRTCachingDirectory has concurrency bug(s).
[ https://issues.apache.org/jira/browse/LUCENE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-4238: Attachment: LUCENE-4238.patch Here is an ugly rough test i started playing with yesterday - it will trigger the first exception quite often for me. NRTCachingDirectory has concurrency bug(s). --- Key: LUCENE-4238 URL: https://issues.apache.org/jira/browse/LUCENE-4238 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Mark Miller Fix For: 4.0, 5.0 Attachments: LUCENE-4238.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4238) NRTCachingDirectory has concurrency bug(s).
[ https://issues.apache.org/jira/browse/LUCENE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419440#comment-13419440 ] Mark Miller commented on LUCENE-4238: - Looks like Hossman has also seen this same issue with RecoveryZkTest. Considering it doesn't seem to fail with that exception on all our jenkins machines, may not be easy to see it there. NRTCachingDirectory has concurrency bug(s). --- Key: LUCENE-4238 URL: https://issues.apache.org/jira/browse/LUCENE-4238 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Mark Miller Fix For: 4.0, 5.0 Attachments: LUCENE-4238.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4238) NRTCachingDirectory has concurrency bug(s).
[ https://issues.apache.org/jira/browse/LUCENE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419443#comment-13419443 ] Mark Miller commented on LUCENE-4238: - Somehow it seems not too difficult to get a file both cached and in the underlying dir - and delete and an assert really don't like that. NRTCachingDirectory has concurrency bug(s). --- Key: LUCENE-4238 URL: https://issues.apache.org/jira/browse/LUCENE-4238 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Mark Miller Fix For: 4.0, 5.0 Attachments: LUCENE-4238.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3892: -- Attachment: LUCENE-3892-blockFor-with-packedints-decoder.patch base: PackedInts.getReaderNoHeader().get(long[]), file io is handled by PackedInts. comp: PackedInts.getDecoder().decode(LongBuffer,LongBuffer), use byte[] to hold the compressed block, and ByteBuffer.wrap().asLongBuffer as a wrapper. Well, not as expected. {noformat} TaskQPS base StdDev baseQPS comp StdDev comp Pct diff AndHighHigh 23.781.06 23.380.42 -7% - 4% AndHighMed 52.063.28 50.821.21 -10% - 6% Fuzzy1 88.560.59 88.982.38 -2% - 3% Fuzzy2 28.800.36 28.970.83 -3% - 4% IntNRQ 41.921.67 41.340.50 -6% - 3% OrHighHigh 15.850.45 15.890.39 -4% - 5% OrHighMed 20.380.61 20.500.62 -5% - 6% PKLookup 110.722.19 111.742.53 -3% - 5% Phrase7.510.127.050.18 -9% - -2% Prefix3 106.272.65 105.371.13 -4% - 2% Respell 112.030.81 112.792.71 -2% - 3% SloppyPhrase 15.430.48 14.920.27 -7% - 1% SpanNear3.520.103.410.06 -7% - 1% Term 39.191.34 39.040.81 -5% - 5% TermBGroup1M 18.450.68 18.330.56 -7% - 6% TermBGroup1M1P 22.780.90 22.260.56 -8% - 4% TermGroup1M 19.500.73 19.420.63 -7% - 6% Wildcard 29.561.13 29.180.28 -5% - 3% {noformat} Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch On the flex branch we explored a number of possible intblock encodings, but for whatever reason never brought them to completion. There are still a number of issues opened with patches in different states. Initial results (based on prototype) were excellent (see http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html ). I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419444#comment-13419444 ] Han Jiang edited comment on LUCENE-3892 at 7/20/12 6:59 PM: So I changed the patch to readBytes(): base: PackedInts.getReaderNoHeader().get(long[]), file io is handled by PackedInts. comp: PackedInts.getDecoder().decode(LongBuffer,LongBuffer), use byte[] to hold the compressed block, and ByteBuffer.wrap().asLongBuffer as a wrapper. Well, not as expected. {noformat} TaskQPS base StdDev baseQPS comp StdDev comp Pct diff AndHighHigh 23.781.06 23.380.42 -7% - 4% AndHighMed 52.063.28 50.821.21 -10% - 6% Fuzzy1 88.560.59 88.982.38 -2% - 3% Fuzzy2 28.800.36 28.970.83 -3% - 4% IntNRQ 41.921.67 41.340.50 -6% - 3% OrHighHigh 15.850.45 15.890.39 -4% - 5% OrHighMed 20.380.61 20.500.62 -5% - 6% PKLookup 110.722.19 111.742.53 -3% - 5% Phrase7.510.127.050.18 -9% - -2% Prefix3 106.272.65 105.371.13 -4% - 2% Respell 112.030.81 112.792.71 -2% - 3% SloppyPhrase 15.430.48 14.920.27 -7% - 1% SpanNear3.520.103.410.06 -7% - 1% Term 39.191.34 39.040.81 -5% - 5% TermBGroup1M 18.450.68 18.330.56 -7% - 6% TermBGroup1M1P 22.780.90 22.260.56 -8% - 4% TermGroup1M 19.500.73 19.420.63 -7% - 6% Wildcard 29.561.13 29.180.28 -5% - 3% {noformat} was (Author: billy): base: PackedInts.getReaderNoHeader().get(long[]), file io is handled by PackedInts. comp: PackedInts.getDecoder().decode(LongBuffer,LongBuffer), use byte[] to hold the compressed block, and ByteBuffer.wrap().asLongBuffer as a wrapper. Well, not as expected. {noformat} TaskQPS base StdDev baseQPS comp StdDev comp Pct diff AndHighHigh 23.781.06 23.380.42 -7% - 4% AndHighMed 52.063.28 50.821.21 -10% - 6% Fuzzy1 88.560.59 88.982.38 -2% - 3% Fuzzy2 28.800.36 28.970.83 -3% - 4% IntNRQ 41.921.67 41.340.50 -6% - 3% OrHighHigh 15.850.45 15.890.39 -4% - 5% OrHighMed 20.380.61 20.500.62 -5% - 6% PKLookup 110.722.19 111.742.53 -3% - 5% Phrase7.510.127.050.18 -9% - -2% Prefix3 106.272.65 105.371.13 -4% - 2% Respell 112.030.81 112.792.71 -2% - 3% SloppyPhrase 15.430.48 14.920.27 -7% - 1% SpanNear3.520.103.410.06 -7% - 1% Term 39.191.34 39.040.81 -5% - 5% TermBGroup1M 18.450.68 18.330.56 -7% - 6% TermBGroup1M1P 22.780.90 22.260.56 -8% - 4% TermGroup1M 19.500.73 19.420.63 -7% - 6% Wildcard 29.561.13 29.180.28 -5% - 3% {noformat} Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) - Key: LUCENE-3892 URL: https://issues.apache.org/jira/browse/LUCENE-3892 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.1 Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-forpfor-with-javadoc.patch, LUCENE-3892-forpfor-with-javadoc.patch,
[jira] [Commented] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
[ https://issues.apache.org/jira/browse/LUCENE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419447#comment-13419447 ] Mark Miller commented on LUCENE-4241: - I don't know what's causing it... But the assert that is generally tripped on delete is tripped because its trying to uncache a file and finds it in the delegate - and its trying to assert the file is not in both - and I can seem to cause that condition kind of easily in a simple multi threaded nrtdir test. I'm also not sure that removing files underneath this dir would cause that situation - it would seem not. non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile - Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Attachments: RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz, just-failures.txt Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
[ https://issues.apache.org/jira/browse/LUCENE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419451#comment-13419451 ] Robert Muir commented on LUCENE-4241: - ok, i know i ran into hellacious problems trying to get all tests using mockdirectorywrapper, i had to disable it in the TestReplicationHandler for this reason because it adds files outside of the directory api, calling sync itself (but mockdirectorywrapper doesnt know about this). This also makes it impossible to re-use MDW's facilities for testing disk full etc (e.g. SOLR-3023). non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile - Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Attachments: RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz, just-failures.txt Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3292) /browse example fails to load on 3x: no field name specified in query and no default specified via 'df' param
[ https://issues.apache.org/jira/browse/SOLR-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419452#comment-13419452 ] Jan Høydahl commented on SOLR-3292: --- Why was this not caught by any tests? Should we add one? /browse example fails to load on 3x: no field name specified in query and no default specified via 'df' param --- Key: SOLR-3292 URL: https://issues.apache.org/jira/browse/SOLR-3292 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Priority: Blocker Fix For: 3.6, 4.0, 5.0 1) java -jar start.jar using solr example on 3x branch circa r1306629 2) load http://localhost:8983/solr/browse 3) browser error: 400 no field name specified in query and no default specified via 'df' param 4) error in logs... {noformat} INFO: [] webapp=/solr path=/browse params={} hits=0 status=400 QTime=3 Mar 28, 2012 4:05:59 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param at org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:158) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:174) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1429) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1245) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.request.SimpleFacets.getFacetQueryCounts(SimpleFacets.java:233) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:194) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3654) Add some tests using Tomcat as servlet container
[ https://issues.apache.org/jira/browse/SOLR-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419457#comment-13419457 ] Steven Rowe commented on SOLR-3654: --- bq. I'm 100% against this. Why? Add some tests using Tomcat as servlet container Key: SOLR-3654 URL: https://issues.apache.org/jira/browse/SOLR-3654 Project: Solr Issue Type: Task Components: Build Environment: Tomcat Reporter: Jan Høydahl Labels: Tomcat Fix For: 4.0 All tests use Jetty, we should add some tests for at least one other servlet container (Tomcat). Ref discussion at http://search-lucene.com/m/6mo9Y1WZaWR1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
[ https://issues.apache.org/jira/browse/LUCENE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419459#comment-13419459 ] Mark Miller commented on LUCENE-4241: - Yeah, I'm not ruling it out either - I try to be very careful before filing lucene bugs based on replication stuff and solrcloud especially (with all its jetty killing and what not). But I seemed to be able to cause the same situation with an isolated test (im not tripping the same assert, but causing a different exception in close because of the same invariant). So I'm somewhat sure it's a real issue with the dir, but since I don't have a fix, I don't know for sure. I tried just over syncing by synchronizing every method in that dir, but no luck :) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile - Key: LUCENE-4241 URL: https://issues.apache.org/jira/browse/LUCENE-4241 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Attachments: RecoveryZkTest.testDistribSearch-100-tests-failures.txt.tgz, just-failures.txt Since getting my new laptop, i've noticed some sporadic failures from RecoveryZkTest, so last night tried running 100 iterations againts trunk (r1363555), and got 5 errors/failures... * 3 asertion failures from NRTCachingDirectory.deleteFile * 1 node recovery assertion from AbstractDistributedZkTestCase.waitForRecoveriesToFinish caused by OOM * 1 searcher leak assertion: opens=1658 closes=1652 (possibly lingering affects from OOM?) see comments/attachments for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3640) Can't seem to click on any of the core admin buttons anymore
[ https://issues.apache.org/jira/browse/SOLR-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419464#comment-13419464 ] Antony Stubbs commented on SOLR-3640: - Ah - sorry guys. Should have tried it out in the others. It appears to render and perform actions correctly in FireFox and Safari. I.e. it's not even a webkit level issue - it only doesn't seem to work in Chrome. Can't seem to click on any of the core admin buttons anymore Key: SOLR-3640 URL: https://issues.apache.org/jira/browse/SOLR-3640 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0-ALPHA Reporter: Antony Stubbs Priority: Critical Attachments: Screen Shot 2012-07-18 at 3.05.10 PM.png, screenshot-1.jpg Trying to click on any of the buttons apparently has no affect. They also have no icons next to them anymore and appear down the left. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3654) Add some tests using Tomcat as servlet container
[ https://issues.apache.org/jira/browse/SOLR-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419473#comment-13419473 ] Jan Høydahl commented on SOLR-3654: --- Mark, you have started committing changes making Solr Jetty-bound, without a prior discussion on dev. Many of our users depend on Solr working with their app-servers, especially the OEMs, so such radical change of direction cannot be made without a thorough discussion and preferably a [VOTE]. Please continue voice your view, and aid in constructive planning for how and when Solr could (if it should) become a standalone app rather than a WAR - also consulting the user community, but it is not constructive to block test quality progress in 4.0 as long as 4.0 is planned to be a WAR release as before. Add some tests using Tomcat as servlet container Key: SOLR-3654 URL: https://issues.apache.org/jira/browse/SOLR-3654 Project: Solr Issue Type: Task Components: Build Environment: Tomcat Reporter: Jan Høydahl Labels: Tomcat Fix For: 4.0 All tests use Jetty, we should add some tests for at least one other servlet container (Tomcat). Ref discussion at http://search-lucene.com/m/6mo9Y1WZaWR1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3623) analysis-extras lucene libraries are redundenly packaged (in war and in lucene-libs)
[ https://issues.apache.org/jira/browse/SOLR-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419475#comment-13419475 ] Hoss Man commented on SOLR-3623: Hmm .. ok, something wonky here i'm missing. I started by trying to do the following {noformat} svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/MorfologikFilterFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/SmartChineseSentenceTokenizerFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/SmartChineseWordTokenFilterFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/StempelPolishStemFilterFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/test/org/apache/solr/analysis/TestMorfologikFilterFactory.java solr/core/src/test/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/test/org/apache/solr/analysis/TestSmartChineseFactories.java solr/core/src/test/org/apache/solr/analysis/ cd solr/core ant test -Dtests.class=\*.analysis.\* {noformat} ...my understanding being that the morfologik jars and their lucene counterparts should already be in solr core, so these solr classes and tests should be able to move over w/o any other changes. right? But this is causing all sorts of compilation failures related to not finding packages/classes like morfologik.stemming.PolishStemmer, org.apache.lucene.analysis.cn.smart.\*, org.apache.lucene.analysis.stempel.\*, etc... So clearly i'm missing something here in how these dependent jars and classpaths are setup (i haven't looked ath te build system closely since the ivy change) so i'll have to dig into this more later today. (posting this now in slim hope that sarowe or rmuir see it and say oh, yeah - the thing you are overlooking is...) analysis-extras lucene libraries are redundenly packaged (in war and in lucene-libs) Key: SOLR-3623 URL: https://issues.apache.org/jira/browse/SOLR-3623 Project: Solr Issue Type: Bug Components: Build Reporter: Lance Norskog Assignee: Hoss Man Priority: Minor Fix For: 4.0, 5.0 Various dependencies for contrib/analysis-extras are packaged contrib/analysis-extras/lucene-libs (along with instructions in contrib/analysis-extras/README.txt that users need to include them explicitly) even though these jars are already hardcoded into the solr war file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3640) Can't seem to click on any of the core admin buttons anymore
[ https://issues.apache.org/jira/browse/SOLR-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419477#comment-13419477 ] Antony Stubbs commented on SOLR-3640: - There were also other parts of the UI that I realise now that I wasn't seeing correctly - that dashboard, and the java properties views. Can't seem to click on any of the core admin buttons anymore Key: SOLR-3640 URL: https://issues.apache.org/jira/browse/SOLR-3640 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0-ALPHA Reporter: Antony Stubbs Priority: Critical Attachments: Screen Shot 2012-07-18 at 3.05.10 PM.png, screenshot-1.jpg Trying to click on any of the buttons apparently has no affect. They also have no icons next to them anymore and appear down the left. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-4241) non-reproducible failures from RecoveryZkTest - mostly NRTCachingDirectory.deleteFile
: Yeah, I'm not ruling it out either - I try to be very careful before : filing lucene bugs based on replication stuff and solrcloud especially FWIW: i didn't even mean to file this as a LUCENE bug, I clicked Create Bug on a SOLR page and Jira just outsmarted me because of it's crazy cookie stuff and having multiple tabs open - i just didn't bother to Move it when i realized hte connection with LUCENE-4238. I make no assumptions about where the underlying problem really is. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org