[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement
[ https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812260#comment-13812260 ] Shai Erera commented on LUCENE-5316: How about if you make up a hierarchical category, e.g. {{charCount/0-100K/0-10K/0-1K/0-100/0-10}}? If there will be candidates in all ranges, that's 100K nodes. Also, we can hack a hierarchical dimension made up of A-Z/A-Z/A-Z... and randomly assign categories at different levels to documents. But, NO_PARENTS is not the only way to exercise the API. By asking top-K on a big flat dimension, when we compute the top-K for that dimension we currently traverse all its children, to find which of them has count>0. So the big flat dimensions also make thousands of calls to ChildrenIterator.next(). > Taxonomy tree traversing improvement > > > Key: LUCENE-5316 > URL: https://issues.apache.org/jira/browse/LUCENE-5316 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Gilad Barkai >Priority: Minor > Attachments: LUCENE-5316.patch > > > The taxonomy traversing is done today utilizing the > {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays > which hold for each ordinal it's (array #1) youngest child and (array #2) > older sibling. > This is a compact way of holding the tree information in memory, but it's not > perfect: > * Large (8 bytes per ordinal in memory) > * Exposes internal implementation > * Utilizing these arrays for tree traversing is not straight forward > * Lose reference locality while traversing (the array is accessed in > increasing only entries, but they may be distant from one another) > * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size) > This issue is about making the traversing more easy, the code more readable, > and open it for future improvements (i.e memory footprint and NRT cost) - > without changing any of the internals. > A later issue(s?) could be opened to address the gaps once this one is done. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5323) Add sizeInBytes to Suggester.Lookup
[ https://issues.apache.org/jira/browse/LUCENE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5323: - Attachment: LUCENE-5323.patch Initial patch implementing sizeInBytes() for Suggest.Lookup. > Add sizeInBytes to Suggester.Lookup > > > Key: LUCENE-5323 > URL: https://issues.apache.org/jira/browse/LUCENE-5323 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5323.patch > > > It would be nice to have a sizeInBytes() method added to Suggester.Lookup > interface. This would allow users to estimate the size of the in-memory data > structure created by various suggester implementation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5323) Add sizeInBytes to Suggester.Lookup
Areek Zillur created LUCENE-5323: Summary: Add sizeInBytes to Suggester.Lookup Key: LUCENE-5323 URL: https://issues.apache.org/jira/browse/LUCENE-5323 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Areek Zillur It would be nice to have a sizeInBytes() method added to Suggester.Lookup interface. This would allow users to estimate the size of the in-memory data structure created by various suggester implementation. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement
[ https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812214#comment-13812214 ] Michael McCandless commented on LUCENE-5316: OK I will re-test, using NO_PARENTS only for the hierarchical field. The problem is, the Wikipedia docs have only one such field, date (Y/M/D), and it's low cardinality. Actually, Wikipedia does have a VERY big taxonomy but I've never succeeded in extracting it... So net/net, I will re-test, but I feel this can easily give a false sense of security since my test data does not have a "big" single-valued hierarchical field... > Taxonomy tree traversing improvement > > > Key: LUCENE-5316 > URL: https://issues.apache.org/jira/browse/LUCENE-5316 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Gilad Barkai >Priority: Minor > Attachments: LUCENE-5316.patch > > > The taxonomy traversing is done today utilizing the > {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays > which hold for each ordinal it's (array #1) youngest child and (array #2) > older sibling. > This is a compact way of holding the tree information in memory, but it's not > perfect: > * Large (8 bytes per ordinal in memory) > * Exposes internal implementation > * Utilizing these arrays for tree traversing is not straight forward > * Lose reference locality while traversing (the array is accessed in > increasing only entries, but they may be distant from one another) > * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size) > This issue is about making the traversing more easy, the code more readable, > and open it for future improvements (i.e memory footprint and NRT cost) - > without changing any of the internals. > A later issue(s?) could be opened to address the gaps once this one is done. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #495: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/495/ 1 tests failed. FAILED: org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.testDistribSearch Error Message: No live SolrServers available to handle this request:[http://127.0.0.1:13780/collection1, http://127.0.0.1:52756/collection1, http://127.0.0.1:65012/collection1, http://127.0.0.1:32173/collection1] Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://127.0.0.1:13780/collection1, http://127.0.0.1:52756/collection1, http://127.0.0.1:65012/collection1, http://127.0.0.1:32173/collection1] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:464) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:268) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:200) Build Log: [...truncated 37396 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812124#comment-13812124 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1538258 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1538258 ] LUCENE-5189: add @Deprecated annotation to SegmentInfo.attributes > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, > LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, > LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5311) Avoid registering replicas which are removed
[ https://issues.apache.org/jira/browse/SOLR-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812119#comment-13812119 ] ASF subversion and git services commented on SOLR-5311: --- Commit 1538255 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1538255 ] SOLR-5311 trying to stop test failures > Avoid registering replicas which are removed > - > > Key: SOLR-5311 > URL: https://issues.apache.org/jira/browse/SOLR-5311 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.6, 5.0 > > Attachments: SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, > SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch > > > If a replica is removed from the clusterstate and if it comes back up it > should not be allowed to register. > Each core ,when comes up, checks if it was already registered and if yes is > it still there. If not ,throw an error and do an unregister . If such a > request come to overseer it should ignore such a core -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5311) Avoid registering replicas which are removed
[ https://issues.apache.org/jira/browse/SOLR-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812118#comment-13812118 ] ASF subversion and git services commented on SOLR-5311: --- Commit 1538254 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1538254 ] SOLR-5311 trying to stop test failures > Avoid registering replicas which are removed > - > > Key: SOLR-5311 > URL: https://issues.apache.org/jira/browse/SOLR-5311 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.6, 5.0 > > Attachments: SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, > SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch > > > If a replica is removed from the clusterstate and if it comes back up it > should not be allowed to register. > Each core ,when comes up, checks if it was already registered and if yes is > it still there. If not ,throw an error and do an unregister . If such a > request come to overseer it should ignore such a core -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement
[ https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812112#comment-13812112 ] Shai Erera commented on LUCENE-5316: Using NO_PARENTS is not that simple decision, since the counts of the parents will be wrong if more than one category of that dimension is added to a document. If it's a flat dimension, and you don't care about the dimension's count, that may be fine. But if it's a hierarchical dimension, the counts of the inner taxonomy nodes will be wrong in that case. While indexing as NO_PARENTS does exercise the API more, I think it's wrong to test it here. NO_PARENTS should be used only for hierarchical dimensions, in order to save space in the category list and eventually (hopefully) speed things up since less bytes are read and decoded during search. But for flat dimensions, it adds the rollupValues cost. If we make the search code smart to detect this is a flat dimension, we'd save that cost (no need to rollup), but I think in general you should tweak OrdPolicy to NO_PARENTS only for hierarchical dimensions. I wonder what the perf numbers will be if you used NO_PARENTS only for the hierarchical dims - that's what we recommend the users to use, so I think that's what we should benchmark. I'll review the patch later. > Taxonomy tree traversing improvement > > > Key: LUCENE-5316 > URL: https://issues.apache.org/jira/browse/LUCENE-5316 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Gilad Barkai >Priority: Minor > Attachments: LUCENE-5316.patch > > > The taxonomy traversing is done today utilizing the > {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays > which hold for each ordinal it's (array #1) youngest child and (array #2) > older sibling. > This is a compact way of holding the tree information in memory, but it's not > perfect: > * Large (8 bytes per ordinal in memory) > * Exposes internal implementation > * Utilizing these arrays for tree traversing is not straight forward > * Lose reference locality while traversing (the array is accessed in > increasing only entries, but they may be distant from one another) > * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size) > This issue is about making the traversing more easy, the code more readable, > and open it for future improvements (i.e memory footprint and NRT cost) - > without changing any of the internals. > A later issue(s?) could be opened to address the gaps once this one is done. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_45) - Build # 3419 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3419/ Java: 64bit/jdk1.7.0_45 -XX:+UseCompressedOops -XX:+UseSerialGC 1 tests failed. FAILED: org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT Error Message: expected:<3> but was:<2> Stack Trace: java.lang.AssertionError: expected:<3> but was:<2> at __randomizedtesting.SeedInfo.seed([93AADFEF7FFAFCCB:262CBE68C03B4E3F]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133) at org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$
[jira] [Resolved] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5321. Resolution: Fixed Fix Version/s: 5.0 4.6 Assignee: Shai Erera Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x. If it turns out FacetCodec is too tricky to write, we can either add it back under facet/ or maybe under demo/. For the moment, I think it's not that important to keep it and maintain it. > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812106#comment-13812106 ] ASF subversion and git services commented on LUCENE-5321: - Commit 1538249 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1538249 ] LUCENE-5321: remove Facet42DocValuesFormat and FacetCodec > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812101#comment-13812101 ] ASF subversion and git services commented on LUCENE-5321: - Commit 1538245 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1538245 ] LUCENE-5321: remove Facet42DocValuesFormat and FacetCodec > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5418) Background merge after field removed from solr.xml causes error
[ https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812094#comment-13812094 ] Erick Erickson commented on SOLR-5418: -- Thanks Robert! Mostly I was making sure I didn't lose anything > Background merge after field removed from solr.xml causes error > --- > > Key: SOLR-5418 > URL: https://issues.apache.org/jira/browse/SOLR-5418 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.5 >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-5418.patch, SOLR-5418.patch > > > Problem from the user's list, cut/pasted below. Robert Muir hacked out a > quick patch he pasted on the dev list, I'll append it shortly. > I am working at implementing solr to work as the search backend for our web > system. So far things have been going well, but today I made some schema > changes and now things have broken. > I updated the schema.xml file and reloaded the core (via the admin > interface). No errors were reported in the logs. > I then pushed 100 records to be indexed. A call to Commit afterwards > seemed fine, however my next call for Optimize caused the following errors: > java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > null:java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > Unfortunately, googling for background merge hit exception came up > with 2 thing: a corrupt index or not enough free space. The host > machine that's hosting solr has 227 out of 229GB free (according to df > -h), so that's not it. > I then ran CheckIndex on the index, and got the following results: > http://apaste.info/gmGU > As someone who is new to solr and lucene, as far as I can tell this > means my index is fine. So I am coming up at a loss. I'm somewhat sure > that I could probably delete my data directory and rebuild it but I am > more interested in finding out why is it having issues, what is the > best way to fix it, and what is the best way to prevent it from > happening when this goes into production. > Does anyone have any advice that may help? > I helped Matthew find the logs and he posted this stack trace: > 1691103929 [http-bio-8080-exec-3] INFO org.apache.solr.update.UpdateHandler > â start > commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > 1691104153 [http-bio-8080-exec-3] INFO > org.apache.solr.update.processor.LogUpdateProcessor â [dbqItems] > webapp=/solr path=/update > params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224 > 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore â > java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 > _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1] > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219) > at > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) > at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > a
[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812089#comment-13812089 ] Shai Erera commented on LUCENE-5321: I'll add the test back with the assumeTrue. I'm not sure where to document this FacetCodec example ... it doesn't seem to belong in any of the classes' javadocs, and package.html aren't too verbose (point to blogs). So maybe we can just write a blog about it, though really, this isn't too complicated to figure out. I'll attach a patch shortly, want to make sure this test + assume really work! > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5418) Background merge after field removed from solr.xml causes error
[ https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-5418: -- Attachment: SOLR-5418.patch Here is the patch from my svn checkout. I sent it to the list really quick just to get an opinion on it. I don't remember why the current checks were added. I guess I can see a line of reasoning that its good for the user to know they are dragging unused shit around in their index. And I can agree with that, but I don't think its the job of the codec to fail a background merge to communicate such a thing to the user. Such a warning/failure could be implemented in other ways, for example a warmer in the example for "firstSearcher" event that looks at fieldinfos and warns or fails "YOU ARE DRAGGING AROUND BOGUS STUFF IN YOUR INDEX" if it finds things that don't match to the schema or something like that: and it would be easy for users to enable/disable. > Background merge after field removed from solr.xml causes error > --- > > Key: SOLR-5418 > URL: https://issues.apache.org/jira/browse/SOLR-5418 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.5 >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-5418.patch, SOLR-5418.patch > > > Problem from the user's list, cut/pasted below. Robert Muir hacked out a > quick patch he pasted on the dev list, I'll append it shortly. > I am working at implementing solr to work as the search backend for our web > system. So far things have been going well, but today I made some schema > changes and now things have broken. > I updated the schema.xml file and reloaded the core (via the admin > interface). No errors were reported in the logs. > I then pushed 100 records to be indexed. A call to Commit afterwards > seemed fine, however my next call for Optimize caused the following errors: > java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > null:java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > Unfortunately, googling for background merge hit exception came up > with 2 thing: a corrupt index or not enough free space. The host > machine that's hosting solr has 227 out of 229GB free (according to df > -h), so that's not it. > I then ran CheckIndex on the index, and got the following results: > http://apaste.info/gmGU > As someone who is new to solr and lucene, as far as I can tell this > means my index is fine. So I am coming up at a loss. I'm somewhat sure > that I could probably delete my data directory and rebuild it but I am > more interested in finding out why is it having issues, what is the > best way to fix it, and what is the best way to prevent it from > happening when this goes into production. > Does anyone have any advice that may help? > I helped Matthew find the logs and he posted this stack trace: > 1691103929 [http-bio-8080-exec-3] INFO org.apache.solr.update.UpdateHandler > â start > commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > 1691104153 [http-bio-8080-exec-3] INFO > org.apache.solr.update.processor.LogUpdateProcessor â [dbqItems] > webapp=/solr path=/update > params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224 > 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore â > java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 > _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1] > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219) > at > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) > at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.ap
[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812084#comment-13812084 ] Michael McCandless commented on LUCENE-5321: Also, maybe somewhere in javadocs we could show how the app could do what Facet46Codec is doing today? Ie, how to gather up all facet fields and then override getDocValuesFormatForField w/ the default codec? > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812082#comment-13812082 ] Michael McCandless commented on LUCENE-5321: +1, but maybe we can keep that test case if we just change it to an assumeTrue(_TestUtil.fieldSupportsHugeBinaryValues)? > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812065#comment-13812065 ] Robert Muir commented on LUCENE-4956: - Just so anyone reading the thread knows: the clause Benson mentioned is not an advertising clause: {quote} Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder. {quote} The BSD advertising clause reads like this: {quote} All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the . {quote} These are very different. > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812059#comment-13812059 ] Benson Margulies commented on LUCENE-4956: -- OK, I see, the email thread about Unicode data in general does certainly cover this. Sometimes the workings of Legal are pretty perplexing. > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812056#comment-13812056 ] Robert Muir commented on LUCENE-4956: - {quote} Rob, I got shat on at great length over this for merely test data over at the WS project. I had to make the build pull the data over the network to get certain directors off of my back. I'm trying to spare you the experience. That's all. {quote} Then perhaps you should push back hard when people don't know what they are talking about, like I do. As i said, the question about using unicode data tables has already been directly answered. {quote} As a low-intensity member of the UTC, I would also expect there to be only one license. However, I compare: {quote} I am also one. this means nothing. {quote} They look pretty different to me. Go figure? {quote} There is only one license from the terms of use page http://www.unicode.org/copyright.html That is what I include. Whoever created your "other license" decided to omit some of the information, which I did not. > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812053#comment-13812053 ] Benson Margulies commented on LUCENE-4956: -- Rob, I got shat on at great length over this for merely test data over at the WS project. I had to make the build pull the data over the network to get certain directors off of my back. I'm trying to spare you the experience. That's all. As a low-intensity member of the UTC, I would also expect there to be only one license. However, I compare: {noformat} # Copyright (c) 1991-2011 Unicode, Inc. All Rights reserved. # # This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No # claims are made as to fitness for any particular purpose. No warranties of # any kind are expressed or implied. The recipient agrees to determine # applicability of information provided. If this file has been provided on # magnetic media by Unicode, Inc., the sole remedy for any claim will be # exchange of defective media within 90 days of receipt. # # Unicode, Inc. hereby grants the right to freely use the information # supplied in this file in the creation of products supporting the # Unicode Standard, and to make copies of this file in any form for # internal or external distribution as long as this notice remains # attached. {noformat} with {noformat} ! Copyright (c) 1991-2013 Unicode, Inc. ! All rights reserved. ! Distributed under the Terms of Use in http://www.unicode.org/copyright.html. ! ! Permission is hereby granted, free of charge, to any person obtaining a copy ! of the Unicode data files and any associated documentation (the "Data Files") ! or Unicode software and any associated documentation (the "Software") to deal ! in the Data Files or Software without restriction, including without limitation ! the rights to use, copy, modify, merge, publish, distribute, and/or sell copies ! of the Data Files or Software, and to permit persons to whom the Data Files or ! Software are furnished to do so, provided that (a) the above copyright notice(s) ! and this permission notice appear with all copies of the Data Files or Software, ! (b) both the above copyright notice(s) and this permission notice appear in ! associated documentation, and (c) there is clear notice in each modified Data ! File or in the Software as well as in the documentation associated with the Data ! File(s) or Software that the data or software has been modified. ! ! THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, ! EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO ! EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ! ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES ! WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF ! CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION ! WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE. ! ! Except as contained in this notice, the name of a copyright holder shall not be ! used in advertising or otherwise to promote the sale, use or other dealings in ! these Data Files or Software without prior written authorization of the copyright holder. {noformat} They look pretty different to me. Go figure? > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812052#comment-13812052 ] Robert Muir commented on LUCENE-4956: - The exact question about using unicode data tables has been answered explicitly already: http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200903.mbox/%3c3d4032300903030415w4831f6e4u65c12881cbb86...@mail.gmail.com%3E I don't think it needs any further discussion > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812051#comment-13812051 ] Robert Muir commented on LUCENE-4956: - This is the unicode license that all of their data and code comes from. There is only one. Please, don't waste my time here, if you want to waste the legal team's time, thats ok :) > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812050#comment-13812050 ] Benson Margulies edited comment on LUCENE-4956 at 11/2/13 4:11 PM: --- That jira concerns a different license. The license on the file pointed-to there has no advertising clause that I can spot. Which isn't to say that legal would have a problem with this, just that I don't think that the JIRA in question tells us. was (Author: bmargulies): That jira concerns a different license. The license on the file pointed-to there has no advertising clause that I can spot. > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812050#comment-13812050 ] Benson Margulies commented on LUCENE-4956: -- That jira concerns a different license. The license on the file pointed-to there has no advertising clause that I can spot. > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812047#comment-13812047 ] Robert Muir commented on LUCENE-4956: - {quote} So that Unicode license is possibly an issue. {quote} No, its not. https://issues.apache.org/jira/browse/LEGAL-108 > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812044#comment-13812044 ] Benson Margulies commented on LUCENE-4956: -- My point is that it might have a bit too much legal notice. Generally, when someone grants a license, the headers all move up to some global NOTICE file, and the file is left with just an Apache license. I also noted the following: ! Except as contained in this notice, the name of a copyright holder shall not be ! used in advertising or otherwise to promote the sale, use or other dealings in ! these Data Files or Software without prior written authorization of the copyright holder. and then noticed: that http://www.apache.org/legal/resolved.html says that it approves of * BSD (without advertising clause). So that Unicode license is possibly an issue. Right now I'm using the git clone, but I just did a pull, and the pathname is lucene/analysis/arirang/src/data/mapHanja.dic > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812039#comment-13812039 ] Robert Muir commented on LUCENE-4956: - please point to specific files in svn that you have concerns about. I recreated this file myself from clearly attributed sources, from scratch. It has *MORE THAN ENOUGH* legal notice. > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812034#comment-13812034 ] Benson Margulies commented on LUCENE-4956: -- Looks like mapHanja.dic needs some adjustment of the legal notice? Or was this going to be replaced? > the korean analyzer that has a korean morphological analyzer and dictionaries > - > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.2 >Reporter: SooMyung Lee >Assignee: Christian Moen > Labels: newbie > Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, > lucene-4956.patch, lucene4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5311) Make it possible to train / run classification over multiple fields
[ https://issues.apache.org/jira/browse/LUCENE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812031#comment-13812031 ] ASF subversion and git services commented on LUCENE-5311: - Commit 1538205 from [~teofili] in branch 'dev/trunk' [ https://svn.apache.org/r1538205 ] LUCENE-5311 - added support for training using multiple content fields for knn and naive bayes > Make it possible to train / run classification over multiple fields > --- > > Key: LUCENE-5311 > URL: https://issues.apache.org/jira/browse/LUCENE-5311 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/classification >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > > It'd be nice to be able to use multiple fields instead of just one for > training / running each classifier. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5302) Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-5302: - Attachment: SOLR-5302.patch Please apply my updated version of the patch or make the same changes before making a new one or I'll have to re-do some work. NOTE: This is against trunk! Working with pre-commit: Changes I had to make: A couple of files were indented with tabs. Since it's a new file, I just reformatted them. The forbidden api checks failed on several files. Mostly requiring either Scanners to have "UTF-8" specified or String.toLowercase to have Locale.ROOT and such-like. I did most of this on the plane ride home, and I must admit it's annoying to have precommit fail because I don't have internet connnectivity, there _must_ be a build flag somewhere. These files have missing javadocs [exec] missing: org.apache.solr.analytics.accumulator [exec] missing: org.apache.solr.analytics.accumulator.facet [exec] missing: org.apache.solr.analytics.expression [exec] missing: org.apache.solr.analytics.plugin [exec] missing: org.apache.solr.analytics.request [exec] missing: org.apache.solr.analytics.statistics [exec] missing: org.apache.solr.analytics.util [exec] missing: org.apache.solr.analytics.util.valuesource [exec] [exec] Missing javadocs were found! Tests failing, and a JVM crash to boot. FieldFacetExrasTest fails with "unknown field int_id". There's nothing in schema-docValues.xml that would map to that field, did it get changed? Is this a difference between trunk and 4x? - org.apache.solr.analytics.NoFacetTest (suite) [junit4] - org.apache.solr.analytics.facet.FieldFacetExtrasTest (suite) [junit4] - org.apache.solr.analytics.expression.ExpressionTest (suite) [junit4] - org.apache.solr.analytics.AbstractAnalyticsStatsTest.initializationError [junit4] - org.apache.solr.analytics.util.valuesource.FunctionTest (suite) [junit4] - org.apache.solr.analytics.facet.AbstractAnalyticsFacetTest.initializationError [junit4] - org.apache.solr.analytics.facet.FieldFacetTest (suite) [junit4] - org.apache.solr.analytics.facet.QueryFacetTest.queryTest [junit4] - org.apache.solr.analytics.facet.RangeFacetTest (suite) > Analytics Component > --- > > Key: SOLR-5302 > URL: https://issues.apache.org/jira/browse/SOLR-5302 > Project: Solr > Issue Type: New Feature >Reporter: Steven Bower >Assignee: Erick Erickson > Attachments: SOLR-5302.patch, SOLR-5302.patch, Search Analytics > Component.pdf, Statistical Expressions.pdf, solr_analytics-2013.10.04-2.patch > > > This ticket is to track a "replacement" for the StatsComponent. The > AnalyticsComponent supports the following features: > * All functionality of StatsComponent (SOLR-4499) > * Field Faceting (SOLR-3435) > ** Support for limit > ** Sorting (bucket name or any stat in the bucket > ** Support for offset > * Range Faceting > ** Supports all options of standard range faceting > * Query Faceting (SOLR-2925) > * Ability to use overall/field facet statistics as input to range/query > faceting (ie calc min/max date and then facet over that range > * Support for more complex aggregate/mapping operations (SOLR-1622) > ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, > median, percentiles > ** Operations: negation, abs, add, multiply, divide, power, log, date math, > string reversal, string concat > ** Easily pluggable framework to add additional operations > * New / cleaner output format > Outstanding Issues: > * Multi-value field support for stats (supported for faceting) > * Multi-shard support (may not be possible for some operations, eg median) -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5321) Remove Facet42DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5321: --- Attachment: LUCENE-5321.patch I ended up removing everything under o.a.l.facet.codecs/, including Facet46Codec. It seemed redundant as all it does is use the app's DVF with the facet fields that are returned by FacetIndexingParams.getAllCategoryListParams(). It's a waste of time and resources to maintain such a Codec. I also removed some tests which tested Facet42DVF. > Remove Facet42DocValuesFormat > - > > Key: LUCENE-5321 > URL: https://issues.apache.org/jira/browse/LUCENE-5321 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera > Attachments: LUCENE-5321.patch > > > The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that > it stores the addresses in direct int[] rather than PackedInts. On > LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it > improves perf for some queries and have negligible effect for others, as well > as RAM consumption isn't much worse. We should remove Facet42DVF and use > DirectDVF instead. > I also want to rename Facet46Codec to FacetCodec. There's no need to refactor > the class whenever the default codec changes (e.g. from 45 to 46) since it > doesn't care about the actual Codec version underneath, it only overrides the > DVF used for the facet fields. FacetCodec should take the DVF from the app > (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as > a utility Codec rather than a real, versioned, Codec. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5418) Background merge after field removed from solr.xml causes error
[ https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-5418: Assignee: Erick Erickson > Background merge after field removed from solr.xml causes error > --- > > Key: SOLR-5418 > URL: https://issues.apache.org/jira/browse/SOLR-5418 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.5 >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-5418.patch > > > Problem from the user's list, cut/pasted below. Robert Muir hacked out a > quick patch he pasted on the dev list, I'll append it shortly. > I am working at implementing solr to work as the search backend for our web > system. So far things have been going well, but today I made some schema > changes and now things have broken. > I updated the schema.xml file and reloaded the core (via the admin > interface). No errors were reported in the logs. > I then pushed 100 records to be indexed. A call to Commit afterwards > seemed fine, however my next call for Optimize caused the following errors: > java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > null:java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > Unfortunately, googling for background merge hit exception came up > with 2 thing: a corrupt index or not enough free space. The host > machine that's hosting solr has 227 out of 229GB free (according to df > -h), so that's not it. > I then ran CheckIndex on the index, and got the following results: > http://apaste.info/gmGU > As someone who is new to solr and lucene, as far as I can tell this > means my index is fine. So I am coming up at a loss. I'm somewhat sure > that I could probably delete my data directory and rebuild it but I am > more interested in finding out why is it having issues, what is the > best way to fix it, and what is the best way to prevent it from > happening when this goes into production. > Does anyone have any advice that may help? > I helped Matthew find the logs and he posted this stack trace: > 1691103929 [http-bio-8080-exec-3] INFO org.apache.solr.update.UpdateHandler > â start > commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > 1691104153 [http-bio-8080-exec-3] INFO > org.apache.solr.update.processor.LogUpdateProcessor â [dbqItems] > webapp=/solr path=/update > params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224 > 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore â > java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 > _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1] > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219) > at > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) > at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > at > org.a
[jira] [Updated] (SOLR-5418) Background merge after field removed from solr.xml causes error
[ https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-5418: - Attachment: SOLR-5418.patch Patch constructed from cut-n-paste of Robert's quick code change on the dev list. Warning, it's cut/paste, not generated from svn so be warned > Background merge after field removed from solr.xml causes error > --- > > Key: SOLR-5418 > URL: https://issues.apache.org/jira/browse/SOLR-5418 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.5 >Reporter: Erick Erickson > Attachments: SOLR-5418.patch > > > Problem from the user's list, cut/pasted below. Robert Muir hacked out a > quick patch he pasted on the dev list, I'll append it shortly. > I am working at implementing solr to work as the search backend for our web > system. So far things have been going well, but today I made some schema > changes and now things have broken. > I updated the schema.xml file and reloaded the core (via the admin > interface). No errors were reported in the logs. > I then pushed 100 records to be indexed. A call to Commit afterwards > seemed fine, however my next call for Optimize caused the following errors: > java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > null:java.io.IOException: background merge hit exception: > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 > [maxNumSegments=1] > Unfortunately, googling for background merge hit exception came up > with 2 thing: a corrupt index or not enough free space. The host > machine that's hosting solr has 227 out of 229GB free (according to df > -h), so that's not it. > I then ran CheckIndex on the index, and got the following results: > http://apaste.info/gmGU > As someone who is new to solr and lucene, as far as I can tell this > means my index is fine. So I am coming up at a loss. I'm somewhat sure > that I could probably delete my data directory and rebuild it but I am > more interested in finding out why is it having issues, what is the > best way to fix it, and what is the best way to prevent it from > happening when this goes into production. > Does anyone have any advice that may help? > I helped Matthew find the logs and he posted this stack trace: > 1691103929 [http-bio-8080-exec-3] INFO org.apache.solr.update.UpdateHandler > â start > commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > 1691104153 [http-bio-8080-exec-3] INFO > org.apache.solr.update.processor.LogUpdateProcessor â [dbqItems] > webapp=/solr path=/update > params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224 > 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore â > java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 > _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1] > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) > at > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) > at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240) > at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219) > at > org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) > at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at
[jira] [Created] (SOLR-5418) Background merge after field removed from solr.xml causes error
Erick Erickson created SOLR-5418: Summary: Background merge after field removed from solr.xml causes error Key: SOLR-5418 URL: https://issues.apache.org/jira/browse/SOLR-5418 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.5 Reporter: Erick Erickson Problem from the user's list, cut/pasted below. Robert Muir hacked out a quick patch he pasted on the dev list, I'll append it shortly. I am working at implementing solr to work as the search backend for our web system. So far things have been going well, but today I made some schema changes and now things have broken. I updated the schema.xml file and reloaded the core (via the admin interface). No errors were reported in the logs. I then pushed 100 records to be indexed. A call to Commit afterwards seemed fine, however my next call for Optimize caused the following errors: java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] null:java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] Unfortunately, googling for background merge hit exception came up with 2 thing: a corrupt index or not enough free space. The host machine that's hosting solr has 227 out of 229GB free (according to df -h), so that's not it. I then ran CheckIndex on the index, and got the following results: http://apaste.info/gmGU As someone who is new to solr and lucene, as far as I can tell this means my index is fine. So I am coming up at a loss. I'm somewhat sure that I could probably delete my data directory and rebuild it but I am more interested in finding out why is it having issues, what is the best way to fix it, and what is the best way to prevent it from happening when this goes into production. Does anyone have any advice that may help? I helped Matthew find the logs and he posted this stack trace: 1691103929 [http-bio-8080-exec-3] INFO org.apache.solr.update.UpdateHandler â start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 1691104153 [http-bio-8080-exec-3] INFO org.apache.solr.update.processor.LogUpdateProcessor â [dbqItems] webapp=/solr path=/update params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore â java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory
[ https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811952#comment-13811952 ] Michael McCandless commented on LUCENE-5320: +1 > Create SearcherTaxonomyManager over Directory > - > > Key: LUCENE-5320 > URL: https://issues.apache.org/jira/browse/LUCENE-5320 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Reporter: Shai Erera > > SearcherTaxonomyManager now only allows working in NRT mode. It could be > useful to have an STM which allows reopening a SearcherAndTaxonomy pair over > Directories, e.g. for replication. The problem is that if the thread that > calls maybeRefresh() is not the one that does the commit(), it could lead to > a pair that is not synchronized. > Perhaps at first we could have a simple version that works under some > assumptions, i.e. that the app does the commit + reopen in the same thread in > that order, so that it can be used by such apps + when replicating the > indexes, and later we can figure out how to generalize it to work even if > commit + reopen are done by separate threads/JVMs. > I'll see if SearcherTaxonomyManager can be extended to support it, or a new > STM is required. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5381) Split Clusterstate and scale
[ https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809096#comment-13809096 ] Noble Paul edited comment on SOLR-5381 at 11/2/13 7:51 AM: --- OK , here is the plan to split clusterstate on a per collection basis h2. How to use this feature? Introduce a new option while creating a collection (external=true) . This will keep the state of the collection in a separate node. example : http://localhost:8983/solr/admin/collections?action=CREATE&name=xcoll&numShards=5&replicationFactor=2&external=true This will result in this following entry in clusterstate.json {code:JavaScript} { “xcoll” : {“ex”:true} } {code} there will be another ZK entry which carries the actual collection information * /collections ** /xcoll *** /state.json {code:JavaScript} {"xcoll":{ "shards":{"shard1":{ "range":”8000-b332”l, "state":"active", "replicas":{ "core_node1":{ "state":"active", "base_url":"http://192.168.1.5:8983/solr";, "core":"xcoll_shard1_replica1", "node_name":"192.168.1.5:8983_solr", "leader":"true", "router":{"name":"compositeId"}}} {code} The main Overseer thread is responsible for creating collections and managing all the events for all the collections in the clusterstate.json . clusterstate.json is modified only when a collection is created/deleted or when state updates happen to “non-external” collections Each external collection to have its own Overseer queue as follows. There will be a separate thread for each external collection. * /collections ** /xcoll *** /overseer /collection-queue-work /queue /queue-work h2. SolrJ enhancements SolrJ would only listen to cluterstate,json. When a request comes for a collection ‘xcoll’ * it would first check if such a collection exists * If yes it first looks up the details in the local cache for that collection * If not found in cache , it fetches the node /collections/xcoll/state.json and caches the information * Any query/update will be sent with extra query param specifying the collection name , shard name, Role (Leader/Replica), and range (example \_target_=xcoll:shard1:L:8000-b332) . A node would throw an error (INVALID_NODE) if it does not the serve the collection/shard/Role/range combo. * If a SolrJ gets INVALID_NODE error it would invalidate the cache and fetch fresh state information for that collection (and caches it again). h2. Changes to each Solr Node Each node would only listen to the clusterstate.json and the states of collections which it is a member of. If a request comes for a collection it does not serve, it first checks for the \_target_ param. All collections present in the clusterstate.json will be deemed as collections it serves * If the param is present and the node does not serve that collection/shard/Role/Range combo, an INVALID_NODE error is thrown ** If the validation succeeds it is served * If the param is not present and the node is a member of the collection, the request is served by ** If the node is not a member of the collection, it uses SolrJ to proxy the request to appropriate location Internally , the node really does not care about the state of external collections. If/when it is required, the information is fetched real time from ZK and used and thrown away. h2. Changes to admin GUI External collections are not shown graphically in the admin UI . was (Author: noble.paul): OK , here is the plan to split clusterstate on a per collection basis h2. How to use this feature? Introduce a new option while creating a collection (external=true) . This will keep the state of the collection in a separate node. example : http://localhost:8983/solr/admin/collections?action=CREATE&name=xcoll&numShards=5&replicationFactor=2&external=true This will result in this following entry in clusterstate.json {code:JavaScript} { “xcoll” : {“ex”:true} } {code} there will be another ZK entry which carries the actual collection information * /collections ** /xcoll *** /state.json {code:JavaScript} {"xcoll":{ "shards":{"shard1":{ "range":”8000-b332”l, "state":"active", "replicas":{ "core_node1":{ "state":"active", "base_url":"http://192.168.1.5:8983/solr";, "core":"xcoll_shard1_replica1", "node_name":"192.168.1.5:8983_solr", "leader":"true", "router":{"name":"compositeId"}}} {code} The main Overseer thread is responsible for creating collections and managing all the events for all the collections in the clusterstate.json . clusterstate.json is modified only when a collection is created/deleted or when state updates happen to “non-external” collections Each external collection to