[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812260#comment-13812260
 ] 

Shai Erera commented on LUCENE-5316:


How about if you make up a hierarchical category, e.g. 
{{charCount/0-100K/0-10K/0-1K/0-100/0-10}}? If there will be candidates in all 
ranges, that's 100K nodes. Also, we can hack a hierarchical dimension made up 
of A-Z/A-Z/A-Z... and randomly assign categories at different levels to 
documents.

But, NO_PARENTS is not the only way to exercise the API. By asking top-K on a 
big flat dimension, when we compute the top-K for that dimension we currently 
traverse all its children, to find which of them has count>0. So the big flat 
dimensions also make thousands of calls to ChildrenIterator.next().

> Taxonomy tree traversing improvement
> 
>
> Key: LUCENE-5316
> URL: https://issues.apache.org/jira/browse/LUCENE-5316
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Gilad Barkai
>Priority: Minor
> Attachments: LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5323) Add sizeInBytes to Suggester.Lookup

2013-11-02 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5323:
-

Attachment: LUCENE-5323.patch

Initial patch implementing sizeInBytes() for Suggest.Lookup.

> Add sizeInBytes to Suggester.Lookup 
> 
>
> Key: LUCENE-5323
> URL: https://issues.apache.org/jira/browse/LUCENE-5323
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Areek Zillur
> Attachments: LUCENE-5323.patch
>
>
> It would be nice to have a sizeInBytes() method added to Suggester.Lookup 
> interface. This would allow users to estimate the size of the in-memory data 
> structure created by various suggester implementation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5323) Add sizeInBytes to Suggester.Lookup

2013-11-02 Thread Areek Zillur (JIRA)
Areek Zillur created LUCENE-5323:


 Summary: Add sizeInBytes to Suggester.Lookup 
 Key: LUCENE-5323
 URL: https://issues.apache.org/jira/browse/LUCENE-5323
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur


It would be nice to have a sizeInBytes() method added to Suggester.Lookup 
interface. This would allow users to estimate the size of the in-memory data 
structure created by various suggester implementation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812214#comment-13812214
 ] 

Michael McCandless commented on LUCENE-5316:


OK I will re-test, using NO_PARENTS only for the hierarchical field.  The 
problem is, the Wikipedia docs have only one such field, date (Y/M/D), and it's 
low cardinality.

Actually, Wikipedia does have a VERY big taxonomy but I've never succeeded in 
extracting it...

So net/net, I will re-test, but I feel this can easily give a false sense of 
security since my test data does not have a "big" single-valued hierarchical 
field...

> Taxonomy tree traversing improvement
> 
>
> Key: LUCENE-5316
> URL: https://issues.apache.org/jira/browse/LUCENE-5316
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Gilad Barkai
>Priority: Minor
> Attachments: LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #495: POMs out of sync

2013-11-02 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/495/

1 tests failed.
FAILED:  org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.testDistribSearch

Error Message:
No live SolrServers available to handle this 
request:[http://127.0.0.1:13780/collection1, 
http://127.0.0.1:52756/collection1, http://127.0.0.1:65012/collection1, 
http://127.0.0.1:32173/collection1]

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request:[http://127.0.0.1:13780/collection1, 
http://127.0.0.1:52756/collection1, http://127.0.0.1:65012/collection1, 
http://127.0.0.1:32173/collection1]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:464)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:268)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:200)




Build Log:
[...truncated 37396 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812124#comment-13812124
 ] 

ASF subversion and git services commented on LUCENE-5189:
-

Commit 1538258 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538258 ]

LUCENE-5189: add @Deprecated annotation to SegmentInfo.attributes

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
> LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
> LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
> LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5311) Avoid registering replicas which are removed

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812119#comment-13812119
 ] 

ASF subversion and git services commented on SOLR-5311:
---

Commit 1538255 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538255 ]

SOLR-5311 trying to stop test failures

> Avoid registering replicas which are removed 
> -
>
> Key: SOLR-5311
> URL: https://issues.apache.org/jira/browse/SOLR-5311
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, 
> SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch
>
>
> If a replica is removed from the clusterstate and if it comes back up it 
> should not be allowed to register. 
> Each core ,when comes up, checks if it was already registered and if yes is 
> it still there. If not ,throw an error and do an unregister . If such a 
> request come to overseer it should ignore such a core



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5311) Avoid registering replicas which are removed

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812118#comment-13812118
 ] 

ASF subversion and git services commented on SOLR-5311:
---

Commit 1538254 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1538254 ]

SOLR-5311 trying to stop test failures

> Avoid registering replicas which are removed 
> -
>
> Key: SOLR-5311
> URL: https://issues.apache.org/jira/browse/SOLR-5311
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, 
> SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch
>
>
> If a replica is removed from the clusterstate and if it comes back up it 
> should not be allowed to register. 
> Each core ,when comes up, checks if it was already registered and if yes is 
> it still there. If not ,throw an error and do an unregister . If such a 
> request come to overseer it should ignore such a core



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812112#comment-13812112
 ] 

Shai Erera commented on LUCENE-5316:


Using NO_PARENTS is not that simple decision, since the counts of the parents 
will be wrong if more than one category of that dimension is added to a 
document. If it's a flat dimension, and you don't care about the dimension's 
count, that may be fine. But if it's a hierarchical dimension, the counts of 
the inner taxonomy nodes will be wrong in that case.

While indexing as NO_PARENTS does exercise the API more, I think it's wrong to 
test it here. NO_PARENTS should be used only for hierarchical dimensions, in 
order to save space in the category list and eventually (hopefully) speed 
things up since less bytes are read and decoded during search. But for flat 
dimensions, it adds the rollupValues cost. If we make the search code smart to 
detect this is a flat dimension, we'd save that cost (no need to rollup), but I 
think in general you should tweak OrdPolicy to NO_PARENTS only for hierarchical 
dimensions. I wonder what the perf numbers will be if you used NO_PARENTS only 
for the hierarchical dims - that's what we recommend the users to use, so I 
think that's what we should benchmark.

I'll review the patch later.

> Taxonomy tree traversing improvement
> 
>
> Key: LUCENE-5316
> URL: https://issues.apache.org/jira/browse/LUCENE-5316
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Gilad Barkai
>Priority: Minor
> Attachments: LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_45) - Build # 3419 - Still Failing!

2013-11-02 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3419/
Java: 64bit/jdk1.7.0_45 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT

Error Message:
expected:<3> but was:<2>

Stack Trace:
java.lang.AssertionError: expected:<3> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([93AADFEF7FFAFCCB:262CBE68C03B4E3F]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133)
at 
org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$

[jira] [Resolved] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5321.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6
 Assignee: Shai Erera
Lucene Fields: New,Patch Available  (was: New)

Committed to trunk and 4x. If it turns out FacetCodec is too tricky to write, 
we can either add it back under facet/ or maybe under demo/. For the moment, I 
think it's not that important to keep it and maintain it.

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812106#comment-13812106
 ] 

ASF subversion and git services commented on LUCENE-5321:
-

Commit 1538249 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538249 ]

LUCENE-5321: remove Facet42DocValuesFormat and FacetCodec

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812101#comment-13812101
 ] 

ASF subversion and git services commented on LUCENE-5321:
-

Commit 1538245 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1538245 ]

LUCENE-5321: remove Facet42DocValuesFormat and FacetCodec

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812094#comment-13812094
 ] 

Erick Erickson commented on SOLR-5418:
--

Thanks Robert! Mostly I was making sure I didn't lose anything





> Background merge after field removed from solr.xml causes error
> ---
>
> Key: SOLR-5418
> URL: https://issues.apache.org/jira/browse/SOLR-5418
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.5
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-5418.patch, SOLR-5418.patch
>
>
> Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
> quick patch he pasted on the dev list, I'll append it shortly.
> I am working at implementing solr to work as the search backend for our web
> system.  So far things have been going well, but today I made some schema
> changes and now things have broken.
> I updated the schema.xml file and reloaded the core (via the admin
> interface).  No errors were reported in the logs.
> I then pushed 100 records to be indexed.  A call to Commit afterwards
> seemed fine, however my next call for Optimize caused the following errors:
> java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> null:java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> Unfortunately, googling for background merge hit exception came up
> with 2 thing: a corrupt index or not enough free space.  The host
> machine that's hosting solr has 227 out of 229GB free (according to df
> -h), so that's not it.
> I then ran CheckIndex on the index, and got the following results:
> http://apaste.info/gmGU
> As someone who is new to solr and lucene, as far as I can tell this
> means my index is fine. So I am coming up at a loss. I'm somewhat sure
> that I could probably delete my data directory and rebuild it but I am
> more interested in finding out why is it having issues, what is the
> best way to fix it, and what is the best way to prevent it from
> happening when this goes into production.
> Does anyone have any advice that may help?
> I helped Matthew find the logs and he posted this stack trace:
> 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
> â start 
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 1691104153 [http-bio-8080-exec-3] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
> webapp=/solr path=/update 
> params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224
> 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
> java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
> _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
> at 
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
> at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> a

[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812089#comment-13812089
 ] 

Shai Erera commented on LUCENE-5321:


I'll add the test back with the assumeTrue. I'm not sure where to document this 
FacetCodec example ... it doesn't seem to belong in any of the classes' 
javadocs, and package.html aren't too verbose (point to blogs). So maybe we can 
just write a blog about it, though really, this isn't too complicated to figure 
out. I'll attach a patch shortly, want to make sure this test + assume really 
work!

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-5418:
--

Attachment: SOLR-5418.patch

Here is the patch from my svn checkout.

I sent it to the list really quick just to get an opinion on it. I don't 
remember why the current checks were added. I guess I can see a line of 
reasoning that its good for the user to know they are dragging unused shit 
around in their index.

And I can agree with that, but I don't think its the job of the codec to fail a 
background merge to communicate such a thing to the user.

Such a warning/failure could be implemented in other ways, for example a warmer 
in the example for "firstSearcher" event that looks at fieldinfos and warns or 
fails "YOU ARE DRAGGING AROUND BOGUS STUFF IN YOUR INDEX" if it finds things 
that don't match to the schema or something like that: and it would be easy for 
users to enable/disable.

> Background merge after field removed from solr.xml causes error
> ---
>
> Key: SOLR-5418
> URL: https://issues.apache.org/jira/browse/SOLR-5418
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.5
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-5418.patch, SOLR-5418.patch
>
>
> Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
> quick patch he pasted on the dev list, I'll append it shortly.
> I am working at implementing solr to work as the search backend for our web
> system.  So far things have been going well, but today I made some schema
> changes and now things have broken.
> I updated the schema.xml file and reloaded the core (via the admin
> interface).  No errors were reported in the logs.
> I then pushed 100 records to be indexed.  A call to Commit afterwards
> seemed fine, however my next call for Optimize caused the following errors:
> java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> null:java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> Unfortunately, googling for background merge hit exception came up
> with 2 thing: a corrupt index or not enough free space.  The host
> machine that's hosting solr has 227 out of 229GB free (according to df
> -h), so that's not it.
> I then ran CheckIndex on the index, and got the following results:
> http://apaste.info/gmGU
> As someone who is new to solr and lucene, as far as I can tell this
> means my index is fine. So I am coming up at a loss. I'm somewhat sure
> that I could probably delete my data directory and rebuild it but I am
> more interested in finding out why is it having issues, what is the
> best way to fix it, and what is the best way to prevent it from
> happening when this goes into production.
> Does anyone have any advice that may help?
> I helped Matthew find the logs and he posted this stack trace:
> 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
> â start 
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 1691104153 [http-bio-8080-exec-3] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
> webapp=/solr path=/update 
> params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224
> 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
> java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
> _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
> at 
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
> at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at 
> org.ap

[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812084#comment-13812084
 ] 

Michael McCandless commented on LUCENE-5321:


Also, maybe somewhere in javadocs we could show how the app could do what 
Facet46Codec is doing today?  Ie, how to gather up all facet fields and then 
override getDocValuesFormatForField w/ the default codec?

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812082#comment-13812082
 ] 

Michael McCandless commented on LUCENE-5321:


+1, but maybe we can keep that test case if we just change it to an 
assumeTrue(_TestUtil.fieldSupportsHugeBinaryValues)?

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812065#comment-13812065
 ] 

Robert Muir commented on LUCENE-4956:
-

Just so anyone reading the thread knows: the clause Benson mentioned is not an 
advertising clause:

{quote}
Except as contained in this notice, the name of a copyright holder shall not be 
used in advertising or otherwise to promote the sale, use or other dealings in 
these Data Files or Software without prior written authorization of the 
copyright holder.
{quote}

The BSD advertising clause reads like this:

{quote}
All advertising materials mentioning features or use of this software must 
display the following acknowledgement: This product includes software developed 
by the .
{quote}

These are very different.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812059#comment-13812059
 ] 

Benson Margulies commented on LUCENE-4956:
--

OK, I see, the email thread about Unicode data in general does certainly cover 
this. Sometimes the workings of Legal are pretty perplexing.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812056#comment-13812056
 ] 

Robert Muir commented on LUCENE-4956:
-

{quote}
Rob, I got shat on at great length over this for merely test data over at the 
WS project. I had to make the build pull the data over the network to get 
certain directors off of my back. I'm trying to spare you the experience. 
That's all.
{quote}

Then perhaps you should push back hard when people don't know what they are 
talking about, like I do. As i said, the question about using unicode data 
tables has already been directly answered.

{quote}
As a low-intensity member of the UTC, I would also expect there to be only one 
license. However, I compare:
{quote}

I am also one. this means nothing.

{quote}
They look pretty different to me. Go figure?
{quote}

There is only one license from the terms of use page 
http://www.unicode.org/copyright.html

That is what I include. Whoever created your "other license" decided to omit 
some of the information, which I did not.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812053#comment-13812053
 ] 

Benson Margulies commented on LUCENE-4956:
--

Rob, I got shat on at great length over this for merely test data over at the 
WS project.  I had to make the build pull the data over the network to get 
certain directors off of my back. I'm trying to spare you the experience. 
That's all.

As a low-intensity member of the UTC, I would also expect there to be only one 
license. However, I compare:

{noformat}
#  Copyright (c) 1991-2011 Unicode, Inc. All Rights reserved.
#  
#  This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No
#  claims are made as to fitness for any particular purpose. No warranties of
#  any kind are expressed or implied. The recipient agrees to determine
#  applicability of information provided. If this file has been provided on
#  magnetic media by Unicode, Inc., the sole remedy for any claim will be
#  exchange of defective media within 90 days of receipt.
#  
#  Unicode, Inc. hereby grants the right to freely use the information
#  supplied in this file in the creation of products supporting the
#  Unicode Standard, and to make copies of this file in any form for
#  internal or external distribution as long as this notice remains
#  attached.
{noformat}

with

{noformat}
! Copyright (c) 1991-2013 Unicode, Inc. 
! All rights reserved. 
! Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
!
! Permission is hereby granted, free of charge, to any person obtaining a copy 
! of the Unicode data files and any associated documentation (the "Data Files") 
! or Unicode software and any associated documentation (the "Software") to deal 
! in the Data Files or Software without restriction, including without 
limitation 
! the rights to use, copy, modify, merge, publish, distribute, and/or sell 
copies 
! of the Data Files or Software, and to permit persons to whom the Data Files 
or 
! Software are furnished to do so, provided that (a) the above copyright 
notice(s) 
! and this permission notice appear with all copies of the Data Files or 
Software, 
! (b) both the above copyright notice(s) and this permission notice appear in 
! associated documentation, and (c) there is clear notice in each modified Data 
! File or in the Software as well as in the documentation associated with the 
Data 
! File(s) or Software that the data or software has been modified.
!
! THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY 
KIND, 
! EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY, 
! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. 
IN NO 
! EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE 
FOR 
! ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES 
! WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION 
OF 
! CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN 
CONNECTION 
! WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.
! 
! Except as contained in this notice, the name of a copyright holder shall not 
be 
! used in advertising or otherwise to promote the sale, use or other dealings 
in 
! these Data Files or Software without prior written authorization of the 
copyright holder.
{noformat}

They look pretty different to me. Go figure?



> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812052#comment-13812052
 ] 

Robert Muir commented on LUCENE-4956:
-

The exact question about using unicode data tables has been answered explicitly 
already:

http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200903.mbox/%3c3d4032300903030415w4831f6e4u65c12881cbb86...@mail.gmail.com%3E

I don't think it needs any further discussion

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812051#comment-13812051
 ] 

Robert Muir commented on LUCENE-4956:
-

This is the unicode license that all of their data and code comes from. There 
is only one.

Please, don't waste my time here, if you want to waste the legal team's time, 
thats ok :)

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812050#comment-13812050
 ] 

Benson Margulies edited comment on LUCENE-4956 at 11/2/13 4:11 PM:
---

That jira concerns a different license. The license on the file pointed-to 
there has no advertising clause that I can spot. Which isn't to say that legal 
would have a problem with this, just that I don't think that the JIRA in 
question tells us.



was (Author: bmargulies):
That jira concerns a different license. The license on the file pointed-to 
there has no advertising clause that I can spot.


> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812050#comment-13812050
 ] 

Benson Margulies commented on LUCENE-4956:
--

That jira concerns a different license. The license on the file pointed-to 
there has no advertising clause that I can spot.


> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812047#comment-13812047
 ] 

Robert Muir commented on LUCENE-4956:
-

{quote}
So that Unicode license is possibly an issue.
{quote}

No, its not. https://issues.apache.org/jira/browse/LEGAL-108

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812044#comment-13812044
 ] 

Benson Margulies commented on LUCENE-4956:
--

My point is that it might have a bit too much legal notice. Generally, when 
someone grants a license, the headers all move up to some global NOTICE file, 
and the file is left with just an Apache license. 

I also noted the following:

! Except as contained in this notice, the name of a copyright holder shall not 
be 
! used in advertising or otherwise to promote the sale, use or other dealings 
in 
! these Data Files or Software without prior written authorization of the 
copyright holder.

and then noticed:

that http://www.apache.org/legal/resolved.html says that it approves of 

 * BSD (without advertising clause). 

So that Unicode license is possibly an issue.

Right now I'm using the git clone, but I just did a pull, and the pathname is 
lucene/analysis/arirang/src/data/mapHanja.dic




> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812039#comment-13812039
 ] 

Robert Muir commented on LUCENE-4956:
-

please point to specific files in svn that you have concerns about.

I recreated this file myself from clearly attributed sources, from scratch. 

It has *MORE THAN ENOUGH* legal notice.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812034#comment-13812034
 ] 

Benson Margulies commented on LUCENE-4956:
--

Looks like mapHanja.dic needs some adjustment of the legal notice? Or was this 
going to be replaced?


> the korean analyzer that has a korean morphological analyzer and dictionaries
> -
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.2
>Reporter: SooMyung Lee
>Assignee: Christian Moen
>  Labels: newbie
> Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5311) Make it possible to train / run classification over multiple fields

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812031#comment-13812031
 ] 

ASF subversion and git services commented on LUCENE-5311:
-

Commit 1538205 from [~teofili] in branch 'dev/trunk'
[ https://svn.apache.org/r1538205 ]

LUCENE-5311 - added support for training using multiple content fields for knn 
and naive bayes

> Make it possible to train / run classification over multiple fields
> ---
>
> Key: LUCENE-5311
> URL: https://issues.apache.org/jira/browse/LUCENE-5311
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/classification
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>
> It'd be nice to be able to use multiple fields instead of just one for 
> training / running each classifier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5302) Analytics Component

2013-11-02 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5302:
-

Attachment: SOLR-5302.patch

Please apply my updated version of the patch or make the same changes before 
making a new one or I'll have to re-do some work.

NOTE: This is against trunk!

Working with pre-commit:

 Changes I had to make:

A couple of files were indented with tabs. Since it's a new file, I just 
reformatted them.

The forbidden api checks failed on several files. Mostly requiring either 
Scanners to have "UTF-8" specified or String.toLowercase to have Locale.ROOT 
and such-like.

I did most of this on the plane ride home, and I must admit it's annoying to 
have precommit fail because I don't have internet connnectivity, there _must_ 
be a build flag somewhere.

These files have missing javadocs
 [exec]   missing: org.apache.solr.analytics.accumulator
 [exec]   missing: org.apache.solr.analytics.accumulator.facet
 [exec]   missing: org.apache.solr.analytics.expression
 [exec]   missing: org.apache.solr.analytics.plugin
 [exec]   missing: org.apache.solr.analytics.request
 [exec]   missing: org.apache.solr.analytics.statistics
 [exec]   missing: org.apache.solr.analytics.util
 [exec]   missing: org.apache.solr.analytics.util.valuesource
 [exec] 
 [exec] Missing javadocs were found!


Tests failing, and a JVM crash to boot. 

FieldFacetExrasTest fails with "unknown field int_id". There's nothing in 
schema-docValues.xml that would map to that field, did it get changed? Is this 
a difference between trunk and 4x?

 - org.apache.solr.analytics.NoFacetTest (suite)
   [junit4]   - org.apache.solr.analytics.facet.FieldFacetExtrasTest (suite)
   [junit4]   - org.apache.solr.analytics.expression.ExpressionTest (suite)
   [junit4]   - 
org.apache.solr.analytics.AbstractAnalyticsStatsTest.initializationError
   [junit4]   - org.apache.solr.analytics.util.valuesource.FunctionTest (suite)
   [junit4]   - 
org.apache.solr.analytics.facet.AbstractAnalyticsFacetTest.initializationError
   [junit4]   - org.apache.solr.analytics.facet.FieldFacetTest (suite)
   [junit4]   - org.apache.solr.analytics.facet.QueryFacetTest.queryTest
   [junit4]   - org.apache.solr.analytics.facet.RangeFacetTest (suite)

> Analytics Component
> ---
>
> Key: SOLR-5302
> URL: https://issues.apache.org/jira/browse/SOLR-5302
> Project: Solr
>  Issue Type: New Feature
>Reporter: Steven Bower
>Assignee: Erick Erickson
> Attachments: SOLR-5302.patch, SOLR-5302.patch, Search Analytics 
> Component.pdf, Statistical Expressions.pdf, solr_analytics-2013.10.04-2.patch
>
>
> This ticket is to track a "replacement" for the StatsComponent. The 
> AnalyticsComponent supports the following features:
> * All functionality of StatsComponent (SOLR-4499)
> * Field Faceting (SOLR-3435)
> ** Support for limit
> ** Sorting (bucket name or any stat in the bucket
> ** Support for offset
> * Range Faceting
> ** Supports all options of standard range faceting
> * Query Faceting (SOLR-2925)
> * Ability to use overall/field facet statistics as input to range/query 
> faceting (ie calc min/max date and then facet over that range
> * Support for more complex aggregate/mapping operations (SOLR-1622)
> ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, 
> median, percentiles
> ** Operations: negation, abs, add, multiply, divide, power, log, date math, 
> string reversal, string concat
> ** Easily pluggable framework to add additional operations
> * New / cleaner output format
> Outstanding Issues:
> * Multi-value field support for stats (supported for faceting)
> * Multi-shard support (may not be possible for some operations, eg median)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5321:
---

Attachment: LUCENE-5321.patch

I ended up removing everything under o.a.l.facet.codecs/, including 
Facet46Codec. It seemed redundant as all it does is use the app's DVF with the 
facet fields that are returned by 
FacetIndexingParams.getAllCategoryListParams(). It's a waste of time and 
resources to maintain such a Codec.

I also removed some tests which tested Facet42DVF.

> Remove Facet42DocValuesFormat
> -
>
> Key: LUCENE-5321
> URL: https://issues.apache.org/jira/browse/LUCENE-5321
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-5321.patch
>
>
> The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
> it stores the addresses in direct int[] rather than PackedInts. On 
> LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
> improves perf for some queries and have negligible effect for others, as well 
> as RAM consumption isn't much worse. We should remove Facet42DVF and use 
> DirectDVF instead.
> I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
> the class whenever the default codec changes (e.g. from 45 to 46) since it 
> doesn't care about the actual Codec version underneath, it only overrides the 
> DVF used for the facet fields. FacetCodec should take the DVF from the app 
> (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
> a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-5418:


Assignee: Erick Erickson

> Background merge after field removed from solr.xml causes error
> ---
>
> Key: SOLR-5418
> URL: https://issues.apache.org/jira/browse/SOLR-5418
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.5
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-5418.patch
>
>
> Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
> quick patch he pasted on the dev list, I'll append it shortly.
> I am working at implementing solr to work as the search backend for our web
> system.  So far things have been going well, but today I made some schema
> changes and now things have broken.
> I updated the schema.xml file and reloaded the core (via the admin
> interface).  No errors were reported in the logs.
> I then pushed 100 records to be indexed.  A call to Commit afterwards
> seemed fine, however my next call for Optimize caused the following errors:
> java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> null:java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> Unfortunately, googling for background merge hit exception came up
> with 2 thing: a corrupt index or not enough free space.  The host
> machine that's hosting solr has 227 out of 229GB free (according to df
> -h), so that's not it.
> I then ran CheckIndex on the index, and got the following results:
> http://apaste.info/gmGU
> As someone who is new to solr and lucene, as far as I can tell this
> means my index is fine. So I am coming up at a loss. I'm somewhat sure
> that I could probably delete my data directory and rebuild it but I am
> more interested in finding out why is it having issues, what is the
> best way to fix it, and what is the best way to prevent it from
> happening when this goes into production.
> Does anyone have any advice that may help?
> I helped Matthew find the logs and he posted this stack trace:
> 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
> â start 
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 1691104153 [http-bio-8080-exec-3] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
> webapp=/solr path=/update 
> params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224
> 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
> java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
> _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
> at 
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
> at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at 
> org.a

[jira] [Updated] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5418:
-

Attachment: SOLR-5418.patch

Patch constructed from cut-n-paste of Robert's quick code change on the dev 
list.

Warning, it's cut/paste, not generated from svn so be warned

> Background merge after field removed from solr.xml causes error
> ---
>
> Key: SOLR-5418
> URL: https://issues.apache.org/jira/browse/SOLR-5418
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.5
>Reporter: Erick Erickson
> Attachments: SOLR-5418.patch
>
>
> Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
> quick patch he pasted on the dev list, I'll append it shortly.
> I am working at implementing solr to work as the search backend for our web
> system.  So far things have been going well, but today I made some schema
> changes and now things have broken.
> I updated the schema.xml file and reloaded the core (via the admin
> interface).  No errors were reported in the logs.
> I then pushed 100 records to be indexed.  A call to Commit afterwards
> seemed fine, however my next call for Optimize caused the following errors:
> java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> null:java.io.IOException: background merge hit exception:
> _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
> [maxNumSegments=1]
> Unfortunately, googling for background merge hit exception came up
> with 2 thing: a corrupt index or not enough free space.  The host
> machine that's hosting solr has 227 out of 229GB free (according to df
> -h), so that's not it.
> I then ran CheckIndex on the index, and got the following results:
> http://apaste.info/gmGU
> As someone who is new to solr and lucene, as far as I can tell this
> means my index is fine. So I am coming up at a loss. I'm somewhat sure
> that I could probably delete my data directory and rebuild it but I am
> more interested in finding out why is it having issues, what is the
> best way to fix it, and what is the best way to prevent it from
> happening when this goes into production.
> Does anyone have any advice that may help?
> I helped Matthew find the logs and he posted this stack trace:
> 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
> â start 
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 1691104153 [http-bio-8080-exec-3] INFO  
> org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
> webapp=/solr path=/update 
> params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 224
> 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
> java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
> _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
> at 
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
> at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
> at 
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
> at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at

[jira] [Created] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-5418:


 Summary: Background merge after field removed from solr.xml causes 
error
 Key: SOLR-5418
 URL: https://issues.apache.org/jira/browse/SOLR-5418
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.5
Reporter: Erick Erickson


Problem from the user's list, cut/pasted below. Robert Muir hacked out a quick 
patch he pasted on the dev list, I'll append it shortly.

I am working at implementing solr to work as the search backend for our web
system.  So far things have been going well, but today I made some schema
changes and now things have broken.

I updated the schema.xml file and reloaded the core (via the admin
interface).  No errors were reported in the logs.

I then pushed 100 records to be indexed.  A call to Commit afterwards
seemed fine, however my next call for Optimize caused the following errors:

java.io.IOException: background merge hit exception:
_2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
[maxNumSegments=1]

null:java.io.IOException: background merge hit exception:
_2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
[maxNumSegments=1]


Unfortunately, googling for background merge hit exception came up
with 2 thing: a corrupt index or not enough free space.  The host
machine that's hosting solr has 227 out of 229GB free (according to df
-h), so that's not it.


I then ran CheckIndex on the index, and got the following results:
http://apaste.info/gmGU


As someone who is new to solr and lucene, as far as I can tell this
means my index is fine. So I am coming up at a loss. I'm somewhat sure
that I could probably delete my data directory and rebuild it but I am
more interested in finding out why is it having issues, what is the
best way to fix it, and what is the best way to prevent it from
happening when this goes into production.


Does anyone have any advice that may help?

I helped Matthew find the logs and he posted this stack trace:

1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  â 
start 
commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
1691104153 [http-bio-8080-exec-3] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] webapp=/solr 
path=/update params={optimize=true&_=1382999386564&wt=json&waitFlush=true} {} 0 
224
1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
_30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java

[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811952#comment-13811952
 ] 

Michael McCandless commented on LUCENE-5320:


+1

> Create SearcherTaxonomyManager over Directory
> -
>
> Key: LUCENE-5320
> URL: https://issues.apache.org/jira/browse/LUCENE-5320
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/facet
>Reporter: Shai Erera
>
> SearcherTaxonomyManager now only allows working in NRT mode. It could be 
> useful to have an STM which allows reopening a SearcherAndTaxonomy pair over 
> Directories, e.g. for replication. The problem is that if the thread that 
> calls maybeRefresh() is not the one that does the commit(), it could lead to 
> a pair that is not synchronized.
> Perhaps at first we could have a simple version that works under some 
> assumptions, i.e. that the app does the commit + reopen in the same thread in 
> that order, so that it can be used by such apps + when replicating the 
> indexes, and later we can figure out how to generalize it to work even if 
> commit + reopen are done by separate threads/JVMs.
> I'll see if SearcherTaxonomyManager can be extended to support it, or a new 
> STM is required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5381) Split Clusterstate and scale

2013-11-02 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809096#comment-13809096
 ] 

Noble Paul edited comment on SOLR-5381 at 11/2/13 7:51 AM:
---

OK ,
here is the plan to split clusterstate on a per collection basis

h2. How to use this feature?
Introduce a new option while creating a collection (external=true) . This will 
keep the state of the collection in a separate node. 
example :

http://localhost:8983/solr/admin/collections?action=CREATE&name=xcoll&numShards=5&replicationFactor=2&external=true

This will result in this following entry in clusterstate.json
{code:JavaScript}
{
 “xcoll” : {“ex”:true}
}
{code}
there will be another ZK entry which carries the actual collection information
*  /collections
** /xcoll
*** /state.json
{code:JavaScript}
{"xcoll":{
"shards":{"shard1":{
"range":”8000-b332”l,
"state":"active",
"replicas":{
   "core_node1":{
  "state":"active",
  "base_url":"http://192.168.1.5:8983/solr";,
   "core":"xcoll_shard1_replica1",
"node_name":"192.168.1.5:8983_solr",
"leader":"true",
"router":{"name":"compositeId"}}}
{code}

The main Overseer thread is responsible for creating collections and managing 
all the events for all the collections in the clusterstate.json . 
clusterstate.json is modified only when a collection is created/deleted or when 
state updates happen to “non-external” collections

Each external collection to have its own Overseer queue as follows. There will 
be a separate thread for each external collection.  

* /collections
** /xcoll
*** /overseer
 /collection-queue-work
 /queue
  /queue-work


h2. SolrJ enhancements
SolrJ would only listen to cluterstate,json. When a request comes for a 
collection ‘xcoll’
* it would first check if such a collection exists
* If yes it first looks up the details in the local cache for that collection 
* If not found in cache , it fetches the node /collections/xcoll/state.json and 
caches the information 
* Any query/update will be sent with extra query param specifying the 
collection name , shard name, Role (Leader/Replica), and range (example 
\_target_=xcoll:shard1:L:8000-b332) . A node would throw an error 
(INVALID_NODE) if it does not the serve the collection/shard/Role/range combo.
* If a SolrJ gets INVALID_NODE error it  would invalidate the cache and fetch 
fresh state information for that collection (and caches it again).

h2. Changes to each Solr Node
Each node would only listen to the clusterstate.json and the states of 
collections which it is a member of. If a request comes for a collection it 
does not serve, it first checks for the \_target_ param. All collections 
present in the clusterstate.json will be deemed as collections it serves
* If the param is present and the node does not serve that 
collection/shard/Role/Range combo, an INVALID_NODE error is thrown
** If the validation succeeds it is served 
* If the param is not present and the node is a member of the collection, the 
request is served by 
** If the node is not a member of the collection,  it uses SolrJ to proxy the 
request to appropriate location

Internally , the node really does not care about the state of external 
collections. If/when it is required, the information is fetched real time from 
ZK and used and thrown away.

h2. Changes to admin GUI
External collections are not shown graphically in the admin UI . 




was (Author: noble.paul):
OK ,
here is the plan to split clusterstate on a per collection basis

h2. How to use this feature?
Introduce a new option while creating a collection (external=true) . This will 
keep the state of the collection in a separate node. 
example :

http://localhost:8983/solr/admin/collections?action=CREATE&name=xcoll&numShards=5&replicationFactor=2&external=true

This will result in this following entry in clusterstate.json
{code:JavaScript}
{
 “xcoll” : {“ex”:true}
}
{code}
there will be another ZK entry which carries the actual collection information
*  /collections
** /xcoll
*** /state.json
{code:JavaScript}
{"xcoll":{
"shards":{"shard1":{
"range":”8000-b332”l,
"state":"active",
"replicas":{
   "core_node1":{
  "state":"active",
  "base_url":"http://192.168.1.5:8983/solr";,
   "core":"xcoll_shard1_replica1",
"node_name":"192.168.1.5:8983_solr",
"leader":"true",
"router":{"name":"compositeId"}}}
{code}

The main Overseer thread is responsible for creating collections and managing 
all the events for all the collections in the clusterstate.json . 
clusterstate.json is modified only when a collection is created/deleted or when 
state updates happen to “non-external” collections

Each external collection to