[jira] [Updated] (LUCENE-4842) SynonymMap.Builder.join(String[], CharsRef) is untested and broken

2013-03-17 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4842:
---

Attachment: LUCENE-4842.patch

 SynonymMap.Builder.join(String[], CharsRef) is untested and broken
 --

 Key: LUCENE-4842
 URL: https://issues.apache.org/jira/browse/LUCENE-4842
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4842.patch


 {{SynonymMap.Builder.join()}} throws {{ArrayIndexOutOfBounds}} when you give 
 it more than one {{String}}: the amount it grows to add the word separator is 
 off by one; it fails to set the length of the {{CharsRef}} it populates; and 
 it needlessly calls {{CharsRef.grow()}} twice per word, first for the word 
 separator and again for the word to be appended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4842) SynonymMap.Builder.join(String[], CharsRef) is untested and broken

2013-03-17 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604515#comment-13604515
 ] 

Steve Rowe commented on LUCENE-4842:


Patch fixes the issues, and switches {{TestSynonymMapFilter.add()}} to use it.

Committing shortly.

 SynonymMap.Builder.join(String[], CharsRef) is untested and broken
 --

 Key: LUCENE-4842
 URL: https://issues.apache.org/jira/browse/LUCENE-4842
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4842.patch


 {{SynonymMap.Builder.join()}} throws {{ArrayIndexOutOfBounds}} when you give 
 it more than one {{String}}: the amount it grows to add the word separator is 
 off by one; it fails to set the length of the {{CharsRef}} it populates; and 
 it needlessly calls {{CharsRef.grow()}} twice per word, first for the word 
 separator and again for the word to be appended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4842) SynonymMap.Builder.join(String[], CharsRef) is untested and broken

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604516#comment-13604516
 ] 

Commit Tag Bot commented on LUCENE-4842:


[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1457380

LUCENE-4842: Unbreak SynonymMap.Builder.join(String[],CharsRef)


 SynonymMap.Builder.join(String[], CharsRef) is untested and broken
 --

 Key: LUCENE-4842
 URL: https://issues.apache.org/jira/browse/LUCENE-4842
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4842.patch


 {{SynonymMap.Builder.join()}} throws {{ArrayIndexOutOfBounds}} when you give 
 it more than one {{String}}: the amount it grows to add the word separator is 
 off by one; it fails to set the length of the {{CharsRef}} it populates; and 
 it needlessly calls {{CharsRef.grow()}} twice per word, first for the word 
 separator and again for the word to be appended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4842) SynonymMap.Builder.join(String[], CharsRef) is untested and broken

2013-03-17 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-4842.


Resolution: Fixed

Committed to trunk and branch_4x.

 SynonymMap.Builder.join(String[], CharsRef) is untested and broken
 --

 Key: LUCENE-4842
 URL: https://issues.apache.org/jira/browse/LUCENE-4842
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4842.patch


 {{SynonymMap.Builder.join()}} throws {{ArrayIndexOutOfBounds}} when you give 
 it more than one {{String}}: the amount it grows to add the word separator is 
 off by one; it fails to set the length of the {{CharsRef}} it populates; and 
 it needlessly calls {{CharsRef.grow()}} twice per word, first for the word 
 separator and again for the word to be appended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4842) SynonymMap.Builder.join(String[], CharsRef) is untested and broken

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604522#comment-13604522
 ] 

Commit Tag Bot commented on LUCENE-4842:


[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1457381

LUCENE-4842: Unbreak SynonymMap.Builder.join(String[],CharsRef) (merged trunk 
r1457380)


 SynonymMap.Builder.join(String[], CharsRef) is untested and broken
 --

 Key: LUCENE-4842
 URL: https://issues.apache.org/jira/browse/LUCENE-4842
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4842.patch


 {{SynonymMap.Builder.join()}} throws {{ArrayIndexOutOfBounds}} when you give 
 it more than one {{String}}: the amount it grows to add the word separator is 
 off by one; it fails to set the length of the {{CharsRef}} it populates; and 
 it needlessly calls {{CharsRef.grow()}} twice per word, first for the word 
 separator and again for the word to be appended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Steve Rowe (JIRA)
Steve Rowe created LUCENE-4843:
--

 Summary: LimitTokenPositionFilter: don't emit tokens with 
positions that exceed the configured limit
 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor


Like LimitTokenCountFilter, except it's the token position that's limited 
rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4842) SynonymMap.Builder.join(String[], CharsRef) is untested and broken

2013-03-17 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4842:
---

Fix Version/s: 4.3
   5.0

 SynonymMap.Builder.join(String[], CharsRef) is untested and broken
 --

 Key: LUCENE-4842
 URL: https://issues.apache.org/jira/browse/LUCENE-4842
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4842.patch


 {{SynonymMap.Builder.join()}} throws {{ArrayIndexOutOfBounds}} when you give 
 it more than one {{String}}: the amount it grows to add the word separator is 
 off by one; it fails to set the length of the {{CharsRef}} it populates; and 
 it needlessly calls {{CharsRef.grow()}} twice per word, first for the word 
 separator and again for the word to be appended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4843:
---

Attachment: LUCENE-4843.patch

Patch implementing the idea.

I think it's ready to go.

 LimitTokenPositionFilter: don't emit tokens with positions that exceed the 
 configured limit
 ---

 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4843.patch


 Like LimitTokenCountFilter, except it's the token position that's limited 
 rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3819 - Failure

2013-03-17 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3819/

1 tests failed.
REGRESSION:  
org.apache.lucene.benchmark.byTask.TestPerfTasksParse.testParseExamples

Error Message:
Could not parse sample file: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/benchmark/classes/test/conf/english-porter-comparison.alg
 reason:class java.lang.RuntimeException:Line #4: 

Stack Trace:
java.lang.AssertionError: Could not parse sample file: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/benchmark/classes/test/conf/english-porter-comparison.alg
 reason:class java.lang.RuntimeException:Line #4: 
at 
__randomizedtesting.SeedInfo.seed([82B8DE48C305E64D:A0DDA4DB2D9385A0]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.benchmark.byTask.TestPerfTasksParse.testParseExamples(TestPerfTasksParse.java:142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:722)




Build Log:
[...truncated 6083 lines...]
[junit4:junit4] Suite: org.apache.lucene.benchmark.byTask.TestPerfTasksParse
[junit4:junit4]   1  queries:
[junit4:junit4]   1 
[junit4:junit4]   1 

Re: [JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3819 - Failure

2013-03-17 Thread Steve Rowe
I committed a fix for this.

On Mar 17, 2013, at 2:59 AM, Apache Jenkins Server jenk...@builds.apache.org 
wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3819/
 
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.benchmark.byTask.TestPerfTasksParse.testParseExamples
 
 Error Message:
 Could not parse sample file: 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/benchmark/classes/test/conf/english-porter-comparison.alg
  reason:class java.lang.RuntimeException:Line #4: 
 
 Stack Trace:
 java.lang.AssertionError: Could not parse sample file: 
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/benchmark/classes/test/conf/english-porter-comparison.alg
  reason:class java.lang.RuntimeException:Line #4: 
   at 
 __randomizedtesting.SeedInfo.seed([82B8DE48C305E64D:A0DDA4DB2D9385A0]:0)
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.lucene.benchmark.byTask.TestPerfTasksParse.testParseExamples(TestPerfTasksParse.java:142)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
   at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
   at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
   at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
   at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
   at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
   at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
   at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
   at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
   at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
   at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
   at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
   at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
   at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
   at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
   at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
   at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
   at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
   at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
   at java.lang.Thread.run(Thread.java:722)
 
 
 
 
 Build Log:
 [...truncated 6083 lines...]
 [junit4:junit4] Suite: 

[jira] [Commented] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604530#comment-13604530
 ] 

Shai Erera commented on LUCENE-4752:


This looks less invasive indeed, but I feel that MP.reorder() is kind of out of 
the blue. Maybe we should find a way to stuff it into OneMerge? I.e.:

* Make OneMerge readers private and add a method {{add(AtomicReader)}} which 
will be called by IW
* Add OneMerge.getReaders() returns a ListAtomicReader
* IW.mergeMiddle() won't add the readers to SegmentMerger directly, but rather 
accumulate them in OneMerge and then just before it calls merger.merge() it 
will either do merger.add(oneMerge.getReaders()) (requires changing SM to take 
a list of readers, not one at a time), or if we don't want to touch SM, just 
add the returned readers one at a time.
* Then SortingMP will return its own OneMerge that wraps all the readers by 
SortingAR.

What do you think?

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4838) Add getOrd to BytesRefHash

2013-03-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4838:
---

Attachment: LUCENE-4838.patch

Renamed to find and replaced all mentions of 'ord' and 'ordinal' by 'id' and 
'bytesID' (depending on context).

Simon, I couldn't use find() in add() because the lookup code modifies the 
hashcode given to the method, which is later used to add the new entry. So 
find() needs to return two values (the id and modified hashcode), which 
unfortunately we can't do. Anyway, this is not such a long code.

I think it's ready.

 Add getOrd to BytesRefHash
 --

 Key: LUCENE-4838
 URL: https://issues.apache.org/jira/browse/LUCENE-4838
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/other
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Attachments: LUCENE-4838.patch, LUCENE-4838.patch


 There is no API today to query BytesRefHash for the existence of a certain 
 BytesRef. Rather, you should use .add(), which looks up the given bytes, and 
 hashes them if they are not found, or returns their ord if they are found.
 I would like to add a simple getOrd API which will return the ord of a given 
 BytesRef, or -1 if not found. I would like to use that API in the facet 
 module, and I need to be able to query the hash without necessarily adding 
 elements to it.
 I have a simple patch, will post shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604536#comment-13604536
 ] 

Adrien Grand commented on LUCENE-4752:
--

bq. This looks less invasive indeed, but I feel that MP.reorder() is kind of 
out of the blue. Maybe we should find a way to stuff it into OneMerge?

Indeed, I thought about OneMerge too and liked this option better but I think 
this is a problem for addIndexes(IndexReader...): this method doesn't need to 
find merges and as a consequence doesn't manipulate OnMerge instances. How 
would we make addIndexes(IndexReader...) sort doc IDs?

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604538#comment-13604538
 ] 

Shai Erera commented on LUCENE-4752:


I actually think that whoever wants to sort during addIndexes should pass a 
SortingAR directly? We have that code example in SortingAR javadocs. I didn't 
think we need to handle IW.addIndexes at all here.

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604540#comment-13604540
 ] 

Adrien Grand commented on LUCENE-4752:
--

Good point! I'll update the patch!

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4838) Add BytesRefHash.find()

2013-03-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4838:
---

Summary: Add BytesRefHash.find()  (was: Add getOrd to BytesRefHash)

 Add BytesRefHash.find()
 ---

 Key: LUCENE-4838
 URL: https://issues.apache.org/jira/browse/LUCENE-4838
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/other
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Attachments: LUCENE-4838.patch, LUCENE-4838.patch


 There is no API today to query BytesRefHash for the existence of a certain 
 BytesRef. Rather, you should use .add(), which looks up the given bytes, and 
 hashes them if they are not found, or returns their ord if they are found.
 I would like to add a simple getOrd API which will return the ord of a given 
 BytesRef, or -1 if not found. I would like to use that API in the facet 
 module, and I need to be able to query the hash without necessarily adding 
 elements to it.
 I have a simple patch, will post shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4838) Add BytesRefHash.find()

2013-03-17 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4838:


Attachment: LUCENE-4838.patch

what about this? I added a findHash method

 Add BytesRefHash.find()
 ---

 Key: LUCENE-4838
 URL: https://issues.apache.org/jira/browse/LUCENE-4838
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/other
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Attachments: LUCENE-4838.patch, LUCENE-4838.patch, LUCENE-4838.patch


 There is no API today to query BytesRefHash for the existence of a certain 
 BytesRef. Rather, you should use .add(), which looks up the given bytes, and 
 hashes them if they are not found, or returns their ord if they are found.
 I would like to add a simple getOrd API which will return the ord of a given 
 BytesRef, or -1 if not found. I would like to use that API in the facet 
 module, and I need to be able to query the hash without necessarily adding 
 elements to it.
 I have a simple patch, will post shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4838) Add BytesRefHash.find()

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604547#comment-13604547
 ] 

Shai Erera commented on LUCENE-4838:


That looks good! thanks for looking into this. I'll run some tests and then 
commit.

 Add BytesRefHash.find()
 ---

 Key: LUCENE-4838
 URL: https://issues.apache.org/jira/browse/LUCENE-4838
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/other
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Attachments: LUCENE-4838.patch, LUCENE-4838.patch, LUCENE-4838.patch


 There is no API today to query BytesRefHash for the existence of a certain 
 BytesRef. Rather, you should use .add(), which looks up the given bytes, and 
 hashes them if they are not found, or returns their ord if they are found.
 I would like to add a simple getOrd API which will return the ord of a given 
 BytesRef, or -1 if not found. I would like to use that API in the facet 
 module, and I need to be able to query the hash without necessarily adding 
 elements to it.
 I have a simple patch, will post shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4838) Add BytesRefHash.find()

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604549#comment-13604549
 ] 

Commit Tag Bot commented on LUCENE-4838:


[trunk commit] Shai Erera
http://svn.apache.org/viewvc?view=revisionrevision=1457400

LUCENE-4838: Add BytesRefHash.find()


 Add BytesRefHash.find()
 ---

 Key: LUCENE-4838
 URL: https://issues.apache.org/jira/browse/LUCENE-4838
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/other
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Attachments: LUCENE-4838.patch, LUCENE-4838.patch, LUCENE-4838.patch


 There is no API today to query BytesRefHash for the existence of a certain 
 BytesRef. Rather, you should use .add(), which looks up the given bytes, and 
 hashes them if they are not found, or returns their ord if they are found.
 I would like to add a simple getOrd API which will return the ord of a given 
 BytesRef, or -1 if not found. I would like to use that API in the facet 
 module, and I need to be able to query the hash without necessarily adding 
 elements to it.
 I have a simple patch, will post shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4838) Add BytesRefHash.find()

2013-03-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-4838.


   Resolution: Fixed
Fix Version/s: 4.3
   5.0

Committed to trunk and 4x. Thanks Simon!

 Add BytesRefHash.find()
 ---

 Key: LUCENE-4838
 URL: https://issues.apache.org/jira/browse/LUCENE-4838
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/other
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4838.patch, LUCENE-4838.patch, LUCENE-4838.patch


 There is no API today to query BytesRefHash for the existence of a certain 
 BytesRef. Rather, you should use .add(), which looks up the given bytes, and 
 hashes them if they are not found, or returns their ord if they are found.
 I would like to add a simple getOrd API which will return the ord of a given 
 BytesRef, or -1 if not found. I would like to use that API in the facet 
 module, and I need to be able to query the hash without necessarily adding 
 elements to it.
 I have a simple patch, will post shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



StoredField

2013-03-17 Thread eksdev
is there any way to get ByteRef from a field originally stored as String?

I am playing with Sorter to implement  StoredDocSorter, analogous to 
NumericDocValuesSorter.  But  realised I do not need ByteRef -  String 
conversion just to compare fields  (byte order would be as good for sorting)

StoredDocument d1 = reader.document(docID1, fieldNamesSet);
String value1 = d1.get(fieldName)
String value1 = d1.getStringAsBytesValue(fieldName)// would love to have it

I need String type in other places, so indexing as byte[] would be too much 
hassle.

String is internally stored as byte[], no reason not to expose it for 
StoredField (or any other type)? 




Re: StoredField

2013-03-17 Thread Shai Erera
You can do new BytesRef(d1.get(fieldName)).

Shai


On Sun, Mar 17, 2013 at 2:43 PM, eksdev eks...@googlemail.com wrote:

 is there any way to get ByteRef from a field originally stored as String?

 I am playing with Sorter to implement  StoredDocSorter, analogous to
 NumericDocValuesSorter.  But  realised I do not need ByteRef -  String
 conversion just to compare fields  (byte order would be as good for sorting)

 StoredDocument d1 = reader.document(docID1, fieldNamesSet);
 String value1 = d1.get(fieldName)
 String value1 = d1.getStringAsBytesValue(fieldName)// would love to have
 it

 I need String type in other places, so indexing as byte[] would be too
 much hassle.

 String is internally stored as byte[], no reason not to expose it for
 StoredField (or any other type)?





Re: StoredField

2013-03-17 Thread eksdev
Shai,  was that irony or I am missing something big time?

I would like to spare BytesRef - String conversion, not to introduce another 
one back to BytesRef

Simply, for sorting, you do not need to do this byte[]-String conversion, byte 
representation of the String is perfectly sortable… 
 

On Mar 17, 2013, at 1:53 PM, Shai Erera ser...@gmail.com wrote:

 You can do new BytesRef(d1.get(fieldName)).
 
 Shai
 
 
 On Sun, Mar 17, 2013 at 2:43 PM, eksdev eks...@googlemail.com wrote:
 is there any way to get ByteRef from a field originally stored as String?
 
 I am playing with Sorter to implement  StoredDocSorter, analogous to 
 NumericDocValuesSorter.  But  realised I do not need ByteRef -  String 
 conversion just to compare fields  (byte order would be as good for sorting)
 
 StoredDocument d1 = reader.document(docID1, fieldNamesSet);
 String value1 = d1.get(fieldName)
 String value1 = d1.getStringAsBytesValue(fieldName)// would love to have it
 
 I need String type in other places, so indexing as byte[] would be too much 
 hassle.
 
 String is internally stored as byte[], no reason not to expose it for 
 StoredField (or any other type)? 
 
 
 



Re: StoredField

2013-03-17 Thread Shai Erera
No no, not irony at all. I misunderstood the first time. You wrote is
there any way to get ByteRef from a field originally stored as String?, so
I understand the first thing that came to mind :).

But I understand the question now -- you say that since the String field is
written as byte[] in the file, you want to read the byte[] as they are,
without translating them to String. right?

I don't know if it's possible. I'd try field.binaryValue(), though looking
at the impl it doesn't suggest it will do what you want.

Shai


On Sun, Mar 17, 2013 at 3:02 PM, eksdev eks...@googlemail.com wrote:

 Shai,  was that irony or I am missing something big time?

 I would like to spare BytesRef - String conversion, not to introduce
 another one back to BytesRef

 Simply, for sorting, you do not need to do this byte[]-String conversion,
 byte representation of the String is perfectly sortable…



 On Mar 17, 2013, at 1:53 PM, Shai Erera ser...@gmail.com wrote:

 You can do new BytesRef(d1.get(fieldName)).

 Shai


 On Sun, Mar 17, 2013 at 2:43 PM, eksdev eks...@googlemail.com wrote:

 is there any way to get ByteRef from a field originally stored as String?

 I am playing with Sorter to implement  StoredDocSorter, analogous to
 NumericDocValuesSorter.  But  realised I do not need ByteRef -  String
 conversion just to compare fields  (byte order would be as good for sorting)

 StoredDocument d1 = reader.document(docID1, fieldNamesSet);
 String value1 = d1.get(fieldName)
 String value1 = d1.getStringAsBytesValue(fieldName)// would love to
 have it

 I need String type in other places, so indexing as byte[] would be too
 much hassle.

 String is internally stored as byte[], no reason not to expose it for
 StoredField (or any other type)?







Re: StoredField

2013-03-17 Thread eksdev
sure, there is a way to make anything - byte[] ;)

it looks like this byte[]-type conversion is done deep-down and this visitor 
user-api gets already correct types  … 

Maybe an idea would be to delay byte[] - type conversion to field access time, 
i do not know what mines would be on the road to do it. 

use cases that require identity checks, or not locale specific sorting and co 
would benefit from having row, serialised representations without type 
conversion…. anyhow, I could switch overt to byte[] fields completely to do ii…

Thanks for responding!  




On Mar 17, 2013, at 2:24 PM, Shai Erera ser...@gmail.com wrote:

 No no, not irony at all. I misunderstood the first time. You wrote is there 
 any way to get ByteRef from a field originally stored as String?, so I 
 understand the first thing that came to mind :).
 
 But I understand the question now -- you say that since the String field is 
 written as byte[] in the file, you want to read the byte[] as they are, 
 without translating them to String. right?
 
 I don't know if it's possible. I'd try field.binaryValue(), though looking at 
 the impl it doesn't suggest it will do what you want.
 
 Shai
 
 
 On Sun, Mar 17, 2013 at 3:02 PM, eksdev eks...@googlemail.com wrote:
 Shai,  was that irony or I am missing something big time?
 
 I would like to spare BytesRef - String conversion, not to introduce another 
 one back to BytesRef
 
 Simply, for sorting, you do not need to do this byte[]-String conversion, 
 byte representation of the String is perfectly sortable… 
 
  
 
 On Mar 17, 2013, at 1:53 PM, Shai Erera ser...@gmail.com wrote:
 
 You can do new BytesRef(d1.get(fieldName)).
 
 Shai
 
 
 On Sun, Mar 17, 2013 at 2:43 PM, eksdev eks...@googlemail.com wrote:
 is there any way to get ByteRef from a field originally stored as String?
 
 I am playing with Sorter to implement  StoredDocSorter, analogous to 
 NumericDocValuesSorter.  But  realised I do not need ByteRef -  String 
 conversion just to compare fields  (byte order would be as good for sorting)
 
 StoredDocument d1 = reader.document(docID1, fieldNamesSet);
 String value1 = d1.get(fieldName)
 String value1 = d1.getStringAsBytesValue(fieldName)// would love to have it
 
 I need String type in other places, so indexing as byte[] would be too much 
 hassle.
 
 String is internally stored as byte[], no reason not to expose it for 
 StoredField (or any other type)? 
 
 
 
 
 



[jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data

2013-03-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604618#comment-13604618
 ] 

David Smiley commented on LUCENE-4658:
--

As I understand it, Lucene's faceting module uses a side-car index.  If so, 
then if the feature proposed here is a good API then the faceting module will 
use it.  No?

Ultimately, it would be cool to be able to expose an externally managed field 
as if it were DocValues, and thus any code that uses DocValues could use it 
without changing its code.  That would be awesome.  I don't know if that would 
be a part of this issue or follow-on that would use the API in this issue to 
make that happen.

 Per-segment tracking of external/side-car data
 --

 Key: LUCENE-4658
 URL: https://issues.apache.org/jira/browse/LUCENE-4658
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4658.patch, LUCENE-4658.patch


 Spinoff from David's idea on LUCENE-4258
 (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
  )
 I made a prototype patch that allows custom per-segment side-car
 data.  It adds an abstract ExternalSegmentData class.  The idea is
 the app implements this, and IndexWriter will pass each Document
 through to it, and call on it to do flushing/merging.  I added a
 setter to IndexWriterConfig to enable it, but I think this would
 really belong in Codec ...
 I haven't tackled the read-side yet, though this is already usable
 without that (ie, the app can just open its own files, read them,
 etc.).
 The random test case passes.
 I think for example this might make it easier for Solr/ElasticSearch
 to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Steve Rowe
I'm pleased to announce that Tommaso Teofili has accepted the PMC's invitation 
to join.

Welcome Tommaso!

- Steve
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Michael McCandless
Welcome Tommaso!

Mike McCandless

http://blog.mikemccandless.com

On Sun, Mar 17, 2013 at 11:04 AM, Steve Rowe sar...@gmail.com wrote:
 I'm pleased to announce that Tommaso Teofili has accepted the PMC's 
 invitation to join.

 Welcome Tommaso!

 - Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data

2013-03-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604624#comment-13604624
 ] 

Michael McCandless commented on LUCENE-4658:


bq. if we go the CustomField approach we allow the app to be as flexible as it 
needs

OK, I see: this CustomField (ExternalField maybe?) would be totally opaque to 
existing Lucene indexing code, and would hold an arbitrary Object which the IW 
chain plugin could then grab.  I agree this makes it more generic!  I think 
this makes sense?

 Per-segment tracking of external/side-car data
 --

 Key: LUCENE-4658
 URL: https://issues.apache.org/jira/browse/LUCENE-4658
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4658.patch, LUCENE-4658.patch


 Spinoff from David's idea on LUCENE-4258
 (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
  )
 I made a prototype patch that allows custom per-segment side-car
 data.  It adds an abstract ExternalSegmentData class.  The idea is
 the app implements this, and IndexWriter will pass each Document
 through to it, and call on it to do flushing/merging.  I added a
 setter to IndexWriterConfig to enable it, but I think this would
 really belong in Codec ...
 I haven't tackled the read-side yet, though this is already usable
 without that (ie, the app can just open its own files, read them,
 etc.).
 The random test case passes.
 I think for example this might make it easier for Solr/ElasticSearch
 to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data

2013-03-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604626#comment-13604626
 ] 

Michael McCandless commented on LUCENE-4658:


bq. As I understand it, Lucene's faceting module uses a side-car index. If so, 
then if the feature proposed here is a good API then the faceting module will 
use it. No?

It does use a side-car (taxonomy) index, so that facet labels use global ords, 
which makes counting/NRT reopen fast.

But, that index is global, vs this patch which adds a per-segment side-car, so 
it wouldn't quite fit, until/unless we change taxonomy writer/reader to work 
per-segment.

 Per-segment tracking of external/side-car data
 --

 Key: LUCENE-4658
 URL: https://issues.apache.org/jira/browse/LUCENE-4658
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4658.patch, LUCENE-4658.patch


 Spinoff from David's idea on LUCENE-4258
 (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
  )
 I made a prototype patch that allows custom per-segment side-car
 data.  It adds an abstract ExternalSegmentData class.  The idea is
 the app implements this, and IndexWriter will pass each Document
 through to it, and call on it to do flushing/merging.  I added a
 setter to IndexWriterConfig to enable it, but I think this would
 really belong in Codec ...
 I haven't tackled the read-side yet, though this is already usable
 without that (ie, the app can just open its own files, read them,
 etc.).
 The random test case passes.
 I think for example this might make it easier for Solr/ElasticSearch
 to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4841:
---

Attachment: LUCENE-4841.patch

Patch, with new class just like SimpleFacetsExample ...

 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data

2013-03-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604634#comment-13604634
 ] 

Robert Muir commented on LUCENE-4658:
-

I don't understand why lucene's facet module would need:

* BinaryDocValues (its the only thing using this, this data type is really 
useless otherwise)
* something additional here on this issue.

as far as I'm concerned, one is enough. so if this one is added, please remove 
BinaryDocValues.



 Per-segment tracking of external/side-car data
 --

 Key: LUCENE-4658
 URL: https://issues.apache.org/jira/browse/LUCENE-4658
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4658.patch, LUCENE-4658.patch


 Spinoff from David's idea on LUCENE-4258
 (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
  )
 I made a prototype patch that allows custom per-segment side-car
 data.  It adds an abstract ExternalSegmentData class.  The idea is
 the app implements this, and IndexWriter will pass each Document
 through to it, and call on it to do flushing/merging.  I added a
 setter to IndexWriterConfig to enable it, but I think this would
 really belong in Codec ...
 I haven't tackled the read-side yet, though this is already usable
 without that (ie, the app can just open its own files, read them,
 etc.).
 The random test case passes.
 I think for example this might make it easier for Solr/ElasticSearch
 to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4752:
-

Attachment: LUCENE-4752.patch

Patch with tests that makes OneMerge responsible for reordering doc IDs. 
Thoughts?

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch, LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Erick Erickson
Welcome!


On Sun, Mar 17, 2013 at 11:12 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Welcome Tommaso!

 Mike McCandless

 http://blog.mikemccandless.com

 On Sun, Mar 17, 2013 at 11:04 AM, Steve Rowe sar...@gmail.com wrote:
  I'm pleased to announce that Tommaso Teofili has accepted the PMC's
 invitation to join.
 
  Welcome Tommaso!
 
  - Steve
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Adrien Grand
Congratulations Tommaso!

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4555) When forceNew is used with CachingDirectoryFactory#get, the old CachValue should have it's path set to null.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604639#comment-13604639
 ] 

Commit Tag Bot commented on SOLR-4555:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457475

SOLR-4555: improve solution


 When forceNew is used with CachingDirectoryFactory#get, the old CachValue 
 should have it's path set to null.
 

 Key: SOLR-4555
 URL: https://issues.apache.org/jira/browse/SOLR-4555
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0


 We don't want the release of the old directory to evict the entry for the 
 forceNew directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Stefan Matheis
Welcome :)



On Sunday, March 17, 2013 at 4:58 PM, Adrien Grand wrote:

 Congratulations Tommaso!
 
 -- 
 Adrien
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
 (mailto:dev-unsubscr...@lucene.apache.org)
 For additional commands, e-mail: dev-h...@lucene.apache.org 
 (mailto:dev-h...@lucene.apache.org)




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Yonik Seeley
Congrats Tommaso!

-Yonik
http://lucidworks.com

On Sun, Mar 17, 2013 at 11:04 AM, Steve Rowe sar...@gmail.com wrote:
 I'm pleased to announce that Tommaso Teofili has accepted the PMC's 
 invitation to join.

 Welcome Tommaso!

 - Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Mark Miller
Welcome Tommaso!

- Mark

On Mar 17, 2013, at 11:04 AM, Steve Rowe sar...@gmail.com wrote:

 I'm pleased to announce that Tommaso Teofili has accepted the PMC's 
 invitation to join.
 
 Welcome Tommaso!
 
 - Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Shalin Shekhar Mangar
Congratulations Tommaso!


On Sun, Mar 17, 2013 at 8:34 PM, Steve Rowe sar...@gmail.com wrote:

 I'm pleased to announce that Tommaso Teofili has accepted the PMC's
 invitation to join.

 Welcome Tommaso!

 - Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Regards,
Shalin Shekhar Mangar.


Re: StoredField

2013-03-17 Thread Adrien Grand
Hi,

On Sun, Mar 17, 2013 at 2:58 PM, eksdev eks...@googlemail.com wrote:
 sure, there is a way to make anything - byte[] ;)

 it looks like this byte[]-type conversion is done deep-down and this
 visitor user-api gets already correct types  …

 Maybe an idea would be to delay byte[] - type conversion to field access
 time, i do not know what mines would be on the road to do it.

 use cases that require identity checks, or not locale specific sorting and
 co would benefit from having row, serialised representations without type
 conversion…. anyhow, I could switch overt to byte[] fields completely to do
 ii…

I understand that it is frustrating to perform a String - byte[]
conversion if Lucene just did the opposite. But because it needs to
perform one random seek per document (on a file which is often large),
the stored fields API is much slower than a String - UTF-8 bytes
conversion, so I think we should keep the API robust rather than
allowing for these kinds of optimizations?

-- 
Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4555) When forceNew is used with CachingDirectoryFactory#get, the old CachValue should have it's path set to null.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604645#comment-13604645
 ] 

Commit Tag Bot commented on SOLR-4555:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457476

SOLR-4555: improve solution


 When forceNew is used with CachingDirectoryFactory#get, the old CachValue 
 should have it's path set to null.
 

 Key: SOLR-4555
 URL: https://issues.apache.org/jira/browse/SOLR-4555
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0


 We don't want the release of the old directory to evict the entry for the 
 forceNew directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4597:
-

 Summary: CachingDirectoryFactory#remove should not attempt to 
empty/remove the index right away but flag for removal after close.
 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4598) The Core Admin unload command's option 'deleteDataDir', should use the DirectoryFactory API to remove the data dir.

2013-03-17 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4598:
-

 Summary: The Core Admin unload command's option 'deleteDataDir', 
should use the DirectoryFactory API to remove the data dir.
 Key: SOLR-4598
 URL: https://issues.apache.org/jira/browse/SOLR-4598
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: StoredField

2013-03-17 Thread eksdev
Hi Adrian, 
I cannot tell if such thing would make it less or more robust, just thinking 
aloud  :)

I am thinking of it as a way to somehow postpone byte-type conversion to the 
moment where it is really needed.  Simply, keep byte[] around as long as 
possible.   
*Theoretically*, this should improve gc() and memory footprint for some types 
of downstream processing. It all depends how easy would something like that be.

There is already a way to achieve this by using binary field type, …  hmmm, 
maybe some lucene.expert hack to make Lucene think every field is binary wold 
be simple and robust enough? 
e.g. Visitor.transportOnlySerializedValuesWithoutTypeConversion()

-

By the way, the trick with tim-sort in Sorter worked great. For 1.1 Mio short 
documents, the time to sort unsorted index on handful of stored fields went 
from 490 seconds to 380. 
Congrats and thanks for it! It also improved compression by 12% (very small, 4k 
chunk size)

On Mar 17, 2013, at 5:26 PM, Adrien Grand jpou...@gmail.com wrote:

 Hi,
 
 On Sun, Mar 17, 2013 at 2:58 PM, eksdev eks...@googlemail.com wrote:
 sure, there is a way to make anything - byte[] ;)
 
 it looks like this byte[]-type conversion is done deep-down and this
 visitor user-api gets already correct types  …
 
 Maybe an idea would be to delay byte[] - type conversion to field access
 time, i do not know what mines would be on the road to do it.
 
 use cases that require identity checks, or not locale specific sorting and
 co would benefit from having row, serialised representations without type
 conversion…. anyhow, I could switch overt to byte[] fields completely to do
 ii…
 
 I understand that it is frustrating to perform a String - byte[]
 conversion if Lucene just did the opposite. But because it needs to
 perform one random seek per document (on a file which is often large),
 the stored fields API is much slower than a String - UTF-8 bytes
 conversion, so I think we should keep the API robust rather than
 allowing for these kinds of optimizations?
 
 -- 
 Adrien
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Tommaso Teofili
thank you all!

Tommaso


2013/3/17 Shalin Shekhar Mangar shalinman...@gmail.com

 Congratulations Tommaso!


 On Sun, Mar 17, 2013 at 8:34 PM, Steve Rowe sar...@gmail.com wrote:

 I'm pleased to announce that Tommaso Teofili has accepted the PMC's
 invitation to join.

 Welcome Tommaso!

 - Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Regards,
 Shalin Shekhar Mangar.



[jira] [Commented] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604666#comment-13604666
 ] 

Mark Miller commented on SOLR-4597:
---

Previously, we tried to dance around this and do a remove from client code 
after the directory was released. This is error prone and fragile and fairly 
ugly semantics for the API.

 CachingDirectoryFactory#remove should not attempt to empty/remove the index 
 right away but flag for removal after close.
 

 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604671#comment-13604671
 ] 

Shai Erera commented on LUCENE-4658:


bq. ExternalField maybe?

I think of ExternalField as something that resides outside the index, while 
CustomField is part of the index. Therefore I prefer custom vs external, but 
that's just naming.

First, this issue may not be used by facets at all. And I agree with Robert 
that there's no point making two implementations for a custom data format. 
Today we have the payloads and BDV as enablers to encode arbitrary data into a 
byte[] (BDV is faster). I think that should be enough, as long as what you want 
is a per-document custom data.

But if you want to encode per-segment global data (e.g. a taxonomy, a graph), 
then BDV (or payload) are not the right API as they are per-document. Rather, I 
think it will be good if we have this CustomDataFormat which is completely 
opaque to Lucene, yet gives the app a lot of flexibility: CustomField passed on 
Documents (at least in my scenarios these per-document datum comprise the 
larger per-segment data structure) takes an Object, CustomDataFormat encodes 
them however it needs, and is also responsible for merging across segments, IR 
gives you a CustomData back. That's it. You app can then cast and work with 
that data however it wants. We can have the getCustomData take a field, in case 
you want to encode two such structures, but we don't need to at first.

If for some reason the app needs custom data per-document and cannot work with 
neither payloads nor BDV, then it needs to have a CustomData type that exposes 
per-document API. In either case, Lucene should not care what's in that data 
except in the indexing chain (to call the right format's API) and during merge, 
to invoke CustomDataFormat.merge().

I hope that's enough?

 Per-segment tracking of external/side-car data
 --

 Key: LUCENE-4658
 URL: https://issues.apache.org/jira/browse/LUCENE-4658
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4658.patch, LUCENE-4658.patch


 Spinoff from David's idea on LUCENE-4258
 (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
  )
 I made a prototype patch that allows custom per-segment side-car
 data.  It adds an abstract ExternalSegmentData class.  The idea is
 the app implements this, and IndexWriter will pass each Document
 through to it, and call on it to do flushing/merging.  I added a
 setter to IndexWriterConfig to enable it, but I think this would
 really belong in Codec ...
 I haven't tackled the read-side yet, though this is already usable
 without that (ie, the app can just open its own files, read them,
 etc.).
 The random test case passes.
 I think for example this might make it easier for Solr/ElasticSearch
 to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data

2013-03-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604683#comment-13604683
 ] 

David Smiley commented on LUCENE-4658:
--

You raise a good point there Rob; BinaryDocValues is pretty close and might be 
sufficient as-is.  But do we need segment based tracking hooks?  Perhaps it's 
useful for parallel / overlay indexes that maintain docid consistency 
(LUCENE-4258 ?), but I don't think that needs to be centered around any 
particular special field.  Shai's issue description points to a comment I made 
but it was in turn a quote of Rob.  Rob  I didn't call out a need for segment 
level tracking; it was commit level tracking.  A couple use-cases I had in mind 
when I made the comment are:

* Storing per-document data that changes often like the number of 
clicks/accesses to the search result -- ultimately used to influence scoring.  
The application's backing store would probably be an in-memory cache with 
occasional syncs to disk.
* Storing a large per-document body text in an external data source (e.g. a DB 
or file system).  Lucene needlessly merges stored fields which I think is quite 
wasteful, not to mention putting it in Lucene is redundant if you already 
manage it somewhere else.  It's ultimately needed via Lucene's API for 
highlighting.

Is per-segment tracking needed for this?  Or is this really about hooks to 
enable a parallel segment level index?  I dunno.


 Per-segment tracking of external/side-car data
 --

 Key: LUCENE-4658
 URL: https://issues.apache.org/jira/browse/LUCENE-4658
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4658.patch, LUCENE-4658.patch


 Spinoff from David's idea on LUCENE-4258
 (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
  )
 I made a prototype patch that allows custom per-segment side-car
 data.  It adds an abstract ExternalSegmentData class.  The idea is
 the app implements this, and IndexWriter will pass each Document
 through to it, and call on it to do flushing/merging.  I added a
 setter to IndexWriterConfig to enable it, but I think this would
 really belong in Codec ...
 I haven't tackled the read-side yet, though this is already usable
 without that (ie, the app can just open its own files, read them,
 etc.).
 The random test case passes.
 I think for example this might make it easier for Solr/ElasticSearch
 to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604684#comment-13604684
 ] 

Commit Tag Bot commented on SOLR-4597:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457494

SOLR-4597: CachingDirectoryFactory#remove should not attempt to empty/remove 
the index right away but flag for removal after close.


 CachingDirectoryFactory#remove should not attempt to empty/remove the index 
 right away but flag for removal after close.
 

 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene/Solr 4.2.1

2013-03-17 Thread Mark Miller
Just a heads up, but I'm likely to bite the bullet on a 4.2.1 soon. I'm not 
100% sure, but I think so. If so, I'd try and roll it this week. 

My current plan would be to put in every bug fix we have so far in Lucene and 
Solr. Optimizations are fair game I think, though I'll be looking at bugs 
myself.

Let me know your thoughts if they differ.

If your onboard, any help merging back stuff is appreciated.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows ([[ Exception while replacing ENV. Please report this as a bug. ]]

2013-03-17 Thread Policeman Jenkins Server
{{
 hudson.remoting.ChannelClosedException: channel is already closed }}) -
 Build # 2664 - Failure!
MIME-Version: 1.0
Content-Type: multipart/mixed; 
boundary==_Part_44_798304149.1363542535898
X-Jenkins-Job: Lucene-Solr-trunk-Windows
X-Jenkins-Result: FAILURE
Precedence: bulk

--=_Part_44_798304149.1363542535898
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/2664/
Java: [[ Exception while replacing ENV. Please report this as a bug. ]]
{{ hudson.remoting.ChannelClosedException: channel is already closed }}

No tests ran.

Build Log:
[...truncated 23342 lines...]
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
reader termination
common.init:

compile-lucene-core:

init:

-clover.disable:

-clover.load:

-clover.classpath:

hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected reader 
termination
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:672)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
at $Proxy76.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
at hudson.Launcher$ProcStarter.join(Launcher.java:360)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:160)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593)
at hudson.model.Run.execute(Run.java:1567)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
Unexpected reader termination
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:732)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:76)
Caused by: java.io.IOException: Unexpected reader termination
... 1 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded


--=_Part_44_798304149.1363542535898--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows ([[ Exception while replacing ENV. Please report this as a bug. ]]

2013-03-17 Thread Policeman Jenkins Server
{{
 hudson.remoting.ChannelClosedException: channel is already closed }}) -
 Build # 2646 - Failure!
MIME-Version: 1.0
Content-Type: multipart/mixed; 
boundary==_Part_48_1606471803.1363543131695
X-Jenkins-Job: Lucene-Solr-4.x-Windows
X-Jenkins-Result: FAILURE
Precedence: bulk

--=_Part_48_1606471803.1363543131695
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Windows/2646/
Java: [[ Exception while replacing ENV. Please report this as a bug. ]]
{{ hudson.remoting.ChannelClosedException: channel is already closed }}

No tests ran.

Build Log:
[...truncated 15 lines...]
ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:494)
at hudson.remoting.Request.call(Request.java:129)
at hudson.remoting.Channel.call(Channel.java:672)
at hudson.EnvVars.getRemote(EnvVars.java:212)
at hudson.model.Computer.getEnvironment(Computer.java:882)
at 
jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28)
at hudson.model.Run.getEnvironment(Run.java:2020)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:943)
at 
hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:131)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:786)
at hudson.model.Build$BuildExecution.post2(Build.java:183)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:733)
at hudson.model.Run.execute(Run.java:1592)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Caused by: java.io.IOException: Unexpected reader termination
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:76)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
ERROR: error while parsing logs for description-setter
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:494)
at hudson.remoting.Request.call(Request.java:129)
at hudson.remoting.Channel.call(Channel.java:672)
at hudson.EnvVars.getRemote(EnvVars.java:212)
at hudson.model.Computer.getEnvironment(Computer.java:882)
at 
jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28)
at hudson.model.Run.getEnvironment(Run.java:2020)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:943)
at 
hudson.plugins.descriptionsetter.DescriptionSetterPublisher.perform(DescriptionSetterPublisher.java:82)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:786)
at hudson.model.Build$BuildExecution.post2(Build.java:183)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:733)
at hudson.model.Run.execute(Run.java:1592)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Caused by: java.io.IOException: Unexpected reader termination
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:76)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Email was triggered for: Failure
Sending email for trigger: Failure


--=_Part_48_1606471803.1363543131695--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604699#comment-13604699
 ] 

Commit Tag Bot commented on SOLR-4597:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457502

SOLR-4597: CachingDirectoryFactory#remove should not attempt to empty/remove 
the index right away but flag for removal after close.


 CachingDirectoryFactory#remove should not attempt to empty/remove the index 
 right away but flag for removal after close.
 

 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604711#comment-13604711
 ] 

Shai Erera commented on LUCENE-4841:


Few comments:

* Can you add a matching test case to demo/tests? See below, these tests are 
important to ensure the example code works!
* Did you manage to run it without getting an exception? I.e. you add Publish 
Date/2010/10/15, which SSDVFF should reject because it's hierarchical?
* Also, main() points to SimpleFacetsExample().runSearch() and runDrillDown(). 
Hmm, maybe that's why you didn't trip on the hierarchy thing?

TestSimpleFacetsExample is your friend :).

 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604716#comment-13604716
 ] 

Shai Erera commented on LUCENE-4752:


Patch looks awesome! Few comments:

* LuceneTestCase: looks like the changes are that you ported the assertXYZ 
methods from TestDuelingCodecs, so I didn't review them. But, since LTC is 
quite big, perhaps we can move these methods to a util, e.g. CompareIndexes? I 
remember that when I wrote the sorting tests, I was looking for such methods!

* Can we make OneMerge.readers private and add OneMerge.add(AtomicReader) for 
IW to use? It looks odd that IW manipulates OneMerge.readers directly, but then 
calls OneMerge.getMergeReaders(). If you do that, the put a comment on readers 
why it's private, so that we don't forget :).

* Can we remove SegmentMerger.add() methods in favor of a single 
merge(ListAtomicReader)? Don't go overboard with it, only if it's trivial (as 
it's not directly related to this issue).

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch, LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604717#comment-13604717
 ] 

Michael McCandless commented on LUCENE-4841:


Woops, I will add a test :)

 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Lucene/Solr 4.2.1

2013-03-17 Thread Uwe Schindler
Hi Mark,

I am fine with that. Would you start to move your fixes into the already 
existing 4.2 release branch, so we can start 4.2.1 from there? I think the 
branch should already be setup for 4.2.1, maybe you only need to raise version 
numbers.

Once you hooked onto the 4.2 branch I would backport the SPI classloader fix - 
that’s the only fix I have (as far as I remember).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Sunday, March 17, 2013 6:50 PM
 To: dev@lucene.apache.org
 Subject: Lucene/Solr 4.2.1
 
 Just a heads up, but I'm likely to bite the bullet on a 4.2.1 soon. I'm not 
 100%
 sure, but I think so. If so, I'd try and roll it this week.
 
 My current plan would be to put in every bug fix we have so far in Lucene
 and Solr. Optimizations are fair game I think, though I'll be looking at bugs
 myself.
 
 Let me know your thoughts if they differ.
 
 If your onboard, any help merging back stuff is appreciated.
 
 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Dawid Weiss
Congratulations and welcome, Tommaso!

Dawid

On Sun, Mar 17, 2013 at 6:10 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 thank you all!

 Tommaso


 2013/3/17 Shalin Shekhar Mangar shalinman...@gmail.com

 Congratulations Tommaso!


 On Sun, Mar 17, 2013 at 8:34 PM, Steve Rowe sar...@gmail.com wrote:

 I'm pleased to announce that Tommaso Teofili has accepted the PMC's
 invitation to join.

 Welcome Tommaso!

 - Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Regards,
 Shalin Shekhar Mangar.



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Uwe Schindler
Welcome Tommaso. I am glad to have you in the PMC! 

Uwe



Dawid Weiss dawid.we...@cs.put.poznan.pl schrieb:

Congratulations and welcome, Tommaso!

Dawid

On Sun, Mar 17, 2013 at 6:10 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
 thank you all!

 Tommaso


 2013/3/17 Shalin Shekhar Mangar shalinman...@gmail.com

 Congratulations Tommaso!


 On Sun, Mar 17, 2013 at 8:34 PM, Steve Rowe sar...@gmail.com
wrote:

 I'm pleased to announce that Tommaso Teofili has accepted the PMC's
 invitation to join.

 Welcome Tommaso!

 - Steve

-
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Regards,
 Shalin Shekhar Mangar.



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Shai Erera
Welcome!

Shai


On Sun, Mar 17, 2013 at 9:28 PM, Uwe Schindler u...@thetaphi.de wrote:

 Welcome Tommaso. I am glad to have you in the PMC!

 Uwe



 Dawid Weiss dawid.we...@cs.put.poznan.pl schrieb:

 Congratulations and welcome, Tommaso!

 Dawid

 On Sun, Mar 17, 2013 at 6:10 PM, Tommaso Teofili

 tommaso.teof...@gmail.com wrote:

 thank you all!

 Tommaso


 2013/3/17 Shalin Shekhar Mangar shalinman...@gmail.com


 Congratulations Tommaso!


 On Sun, Mar 17, 2013 at 8:34 PM, Steve Rowe sar...@gmail.com wrote:

 I'm pleased to announce that Tommaso Teofili has accepted the PMC's
 invitation to join.


 Welcome Tommaso!

 - Steve
 --

 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 Regards,
 Shalin Shekhar Mangar.




 --

 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

 For additional commands, e-mail: dev-h...@lucene.apache.org


 --
 Uwe Schindler
 H.-H.-Meier-Allee 63, 28213 Bremen
 http://www.thetaphi.de



[jira] [Updated] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4841:
---

Attachment: LUCENE-4841.patch

New patch with test  CHANGES ... I think it's ready.

 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch, LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604725#comment-13604725
 ] 

Shai Erera commented on LUCENE-4841:


Looks good! +1

 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch, LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4597.
---

Resolution: Fixed

 CachingDirectoryFactory#remove should not attempt to empty/remove the index 
 right away but flag for removal after close.
 

 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4599) CachingDirectoryFactory calls close(Directory) on forceNew if the Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue)

2013-03-17 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4599:
-

 Summary: CachingDirectoryFactory calls close(Directory) on 
forceNew if the Directory has a refCnt of 0, but it should call 
closeDirectory(CacheValueValue) 
 Key: SOLR-4599
 URL: https://issues.apache.org/jira/browse/SOLR-4599
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4841.


Resolution: Fixed

 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch, LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604735#comment-13604735
 ] 

Commit Tag Bot commented on LUCENE-4841:


[branch_4x commit] Michael McCandless
http://svn.apache.org/viewvc?view=revisionrevision=1457546

LUCENE-4841: add example for SortedSetDV facets


 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch, LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4841) Add SortedSetDocValuesFacetField example to SimpleFacetsExample

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604736#comment-13604736
 ] 

Commit Tag Bot commented on LUCENE-4841:


[trunk commit] Michael McCandless
http://svn.apache.org/viewvc?view=revisionrevision=1457545

LUCENE-4841: add example for SortedSetDV facets


 Add SortedSetDocValuesFacetField example to SimpleFacetsExample
 ---

 Key: LUCENE-4841
 URL: https://issues.apache.org/jira/browse/LUCENE-4841
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4841.patch, LUCENE-4841.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604744#comment-13604744
 ] 

Commit Tag Bot commented on SOLR-4597:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457556

SOLR-4597: Additional work.

SOLR-4598: The Core Admin unload command's option 'deleteDataDir', should use 
the DirectoryFactory API to remove the data dir. 

SOLR-4599: CachingDirectoryFactory calls close(Directory) on forceNew if the 
Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue).


 CachingDirectoryFactory#remove should not attempt to empty/remove the index 
 right away but flag for removal after close.
 

 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4599) CachingDirectoryFactory calls close(Directory) on forceNew if the Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue)

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604746#comment-13604746
 ] 

Commit Tag Bot commented on SOLR-4599:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457556

SOLR-4597: Additional work.

SOLR-4598: The Core Admin unload command's option 'deleteDataDir', should use 
the DirectoryFactory API to remove the data dir. 

SOLR-4599: CachingDirectoryFactory calls close(Directory) on forceNew if the 
Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue).


 CachingDirectoryFactory calls close(Directory) on forceNew if the Directory 
 has a refCnt of 0, but it should call closeDirectory(CacheValueValue) 
 --

 Key: SOLR-4599
 URL: https://issues.apache.org/jira/browse/SOLR-4599
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4844) Remove TaxonomyReader.getParent(ord)

2013-03-17 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-4844:
--

 Summary: Remove TaxonomyReader.getParent(ord)
 Key: LUCENE-4844
 URL: https://issues.apache.org/jira/browse/LUCENE-4844
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera


This should have been gone when we introduced .getParallelArrays(). The method 
simply calls getParallelArrays().parents()[ord], and that's what callers should 
do. Except one test, no other code in facets calls this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4844) Remove TaxonomyReader.getParent(ord)

2013-03-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4844:
---

Attachment: LUCENE-4844.patch

Patch removes getParent and fixes test. I will commit tomorrow.

 Remove TaxonomyReader.getParent(ord)
 

 Key: LUCENE-4844
 URL: https://issues.apache.org/jira/browse/LUCENE-4844
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4844.patch


 This should have been gone when we introduced .getParallelArrays(). The 
 method simply calls getParallelArrays().parents()[ord], and that's what 
 callers should do. Except one test, no other code in facets calls this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4597) CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604749#comment-13604749
 ] 

Commit Tag Bot commented on SOLR-4597:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457561

SOLR-4597: Additional work.

SOLR-4598: The Core Admin unload command's option 'deleteDataDir', should use 
the DirectoryFactory API to remove the data dir. 

SOLR-4599: CachingDirectoryFactory calls close(Directory) on forceNew if the 
Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue).


 CachingDirectoryFactory#remove should not attempt to empty/remove the index 
 right away but flag for removal after close.
 

 Key: SOLR-4597
 URL: https://issues.apache.org/jira/browse/SOLR-4597
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4598) The Core Admin unload command's option 'deleteDataDir', should use the DirectoryFactory API to remove the data dir.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604750#comment-13604750
 ] 

Commit Tag Bot commented on SOLR-4598:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457561

SOLR-4597: Additional work.

SOLR-4598: The Core Admin unload command's option 'deleteDataDir', should use 
the DirectoryFactory API to remove the data dir. 

SOLR-4599: CachingDirectoryFactory calls close(Directory) on forceNew if the 
Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue).


 The Core Admin unload command's option 'deleteDataDir', should use the 
 DirectoryFactory API to remove the data dir.
 ---

 Key: SOLR-4598
 URL: https://issues.apache.org/jira/browse/SOLR-4598
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4599) CachingDirectoryFactory calls close(Directory) on forceNew if the Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue)

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604751#comment-13604751
 ] 

Commit Tag Bot commented on SOLR-4599:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457561

SOLR-4597: Additional work.

SOLR-4598: The Core Admin unload command's option 'deleteDataDir', should use 
the DirectoryFactory API to remove the data dir. 

SOLR-4599: CachingDirectoryFactory calls close(Directory) on forceNew if the 
Directory has a refCnt of 0, but it should call closeDirectory(CacheValueValue).


 CachingDirectoryFactory calls close(Directory) on forceNew if the Directory 
 has a refCnt of 0, but it should call closeDirectory(CacheValueValue) 
 --

 Key: SOLR-4599
 URL: https://issues.apache.org/jira/browse/SOLR-4599
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4600) 400 Bad Request status should be returned if a query parameter has the wrong datatype

2013-03-17 Thread Jack Krupansky (JIRA)
Jack Krupansky created SOLR-4600:


 Summary: 400 Bad Request status should be returned if a query 
parameter has the wrong datatype
 Key: SOLR-4600
 URL: https://issues.apache.org/jira/browse/SOLR-4600
 Project: Solr
  Issue Type: Bug
Reporter: Jack Krupansky


Solr returns a 500 Server Error for the following query even though the error 
is really a user error - wrong datatype for the rows parameter:

{code}
curl http://localhost:8983/solr/select/?q=*:*rows=all; -v
{code}

The rows parameter of course expects an integer.

Somebody should probably trap the raw number format exception and turn it into 
a 400 Bad Request SolrException.

The actual response:

{code}
Jack Krupansky@JackKrupansky ~ $ curl 
http://localhost:8983/solr/select/?q=*:*rows=all; -v
* About to connect() to localhost port 8983 (#0)
*   Trying 127.0.0.1...
* connected
* Connected to localhost (127.0.0.1) port 8983 (#0)
 GET /solr/select/?q=*:*rows=all HTTP/1.1
 User-Agent: curl/7.27.0
 Host: localhost:8983
 Accept: */*

* additional stuff not fine 
/usr/src/ports/curl/curl-7.27.0-1/src/curl-7.27.0/lib/transfer.c:1037: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
 HTTP/1.1 500 Server Error
 Cache-Control: no-cache, no-store
 Pragma: no-cache
 Expires: Sat, 01 Jan 2000 01:00:00 GMT
 Last-Modified: Sun, 17 Mar 2013 21:23:39 GMT
 ETag: 13d7a3c83fb
 Content-Type: application/xml; charset=UTF-8
 Transfer-Encoding: chunked

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status500/intint 
name=QTime1/intlst name=paramsstr name=q*:*/strstr 
name=rowsall/str/lst/lstlst name=errorstr name=msgFor input 
string: all/strstr name=tracejava.lang.NumberFormatException: For input 
string: all
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.apache.solr.search.QParser.getSort(QParser.java:277)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:123)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 

[jira] [Updated] (SOLR-4600) 400 Bad Request status should be returned if a query parameter has the wrong datatype

2013-03-17 Thread Jack Krupansky (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated SOLR-4600:
-

Affects Version/s: 4.2

 400 Bad Request status should be returned if a query parameter has the wrong 
 datatype
 -

 Key: SOLR-4600
 URL: https://issues.apache.org/jira/browse/SOLR-4600
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Jack Krupansky

 Solr returns a 500 Server Error for the following query even though the error 
 is really a user error - wrong datatype for the rows parameter:
 {code}
 curl http://localhost:8983/solr/select/?q=*:*rows=all; -v
 {code}
 The rows parameter of course expects an integer.
 Somebody should probably trap the raw number format exception and turn it 
 into a 400 Bad Request SolrException.
 The actual response:
 {code}
 Jack Krupansky@JackKrupansky ~ $ curl 
 http://localhost:8983/solr/select/?q=*:*rows=all; -v
 * About to connect() to localhost port 8983 (#0)
 *   Trying 127.0.0.1...
 * connected
 * Connected to localhost (127.0.0.1) port 8983 (#0)
  GET /solr/select/?q=*:*rows=all HTTP/1.1
  User-Agent: curl/7.27.0
  Host: localhost:8983
  Accept: */*
 
 * additional stuff not fine 
 /usr/src/ports/curl/curl-7.27.0-1/src/curl-7.27.0/lib/transfer.c:1037: 0 0
 * HTTP 1.1 or later with persistent connection, pipelining supported
  HTTP/1.1 500 Server Error
  Cache-Control: no-cache, no-store
  Pragma: no-cache
  Expires: Sat, 01 Jan 2000 01:00:00 GMT
  Last-Modified: Sun, 17 Mar 2013 21:23:39 GMT
  ETag: 13d7a3c83fb
  Content-Type: application/xml; charset=UTF-8
  Transfer-Encoding: chunked
 
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status500/intint 
 name=QTime1/intlst name=paramsstr name=q*:*/strstr 
 name=rowsall/str/lst/lstlst name=errorstr name=msgFor input 
 string: all/strstr name=tracejava.lang.NumberFormatException: For 
 input string: all
 at java.lang.NumberFormatException.forInputString(Unknown Source)
 at java.lang.Integer.parseInt(Unknown Source)
 at java.lang.Integer.parseInt(Unknown Source)
 at org.apache.solr.search.QParser.getSort(QParser.java:277)
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:123)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 

[jira] [Commented] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604769#comment-13604769
 ] 

Commit Tag Bot commented on LUCENE-4843:


[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1457572

LUCENE-4843: Add LimitTokenPositionFilter: don't emit tokens with positions 
that exceed the configured limit


 LimitTokenPositionFilter: don't emit tokens with positions that exceed the 
 configured limit
 ---

 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Attachments: LUCENE-4843.patch


 Like LimitTokenCountFilter, except it's the token position that's limited 
 rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4845:
--

 Summary: Add AnalyzingInfixSuggester
 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3


Our current suggester impls do prefix matching of the incoming text
against all compiled suggestions, but in some cases it's useful to
allow infix matching.  E.g, Netflix does infix suggestions in their
search box.

I did a straightforward impl, just using a normal Lucene index, and
using PostingsHighlighter to highlight matching tokens in the
suggestions.

I think this likely only works well when your suggestions have a
strong prior ranking (weight input to build), eg Netflix knows
the popularity of movies.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4845:
---

Attachment: LUCENE-4845.patch

Initial patch, lots of nocommits, only a basic test so far...

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4845:
---

Attachment: infixSuggest.png

Screen shot showing suggestions for hear.

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved LUCENE-4843.


   Resolution: Fixed
Fix Version/s: 4.3
   5.0

Committed to trunk and branch_4x.

 LimitTokenPositionFilter: don't emit tokens with positions that exceed the 
 configured limit
 ---

 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4843.patch


 Like LimitTokenCountFilter, except it's the token position that's limited 
 rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604774#comment-13604774
 ] 

Michael McCandless commented on LUCENE-4845:


This is an example of the infix suggestions: !infixSuggest.png!


 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4601) A Collection that is only partially created and then deleted will leave pre allocated shard information in ZooKeeper.

2013-03-17 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4601:
-

 Summary: A Collection that is only partially created and then 
deleted will leave pre allocated shard information in ZooKeeper.
 Key: SOLR-4601
 URL: https://issues.apache.org/jira/browse/SOLR-4601
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0


This means you can't try and create the collection again as it will appear to 
already exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604776#comment-13604776
 ] 

Commit Tag Bot commented on LUCENE-4843:


[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1457574

LUCENE-4843: Add LimitTokenPositionFilter: don't emit tokens with positions 
that exceed the configured limit (merged trunk r1457572)


 LimitTokenPositionFilter: don't emit tokens with positions that exceed the 
 configured limit
 ---

 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4843.patch


 Like LimitTokenCountFilter, except it's the token position that's limited 
 rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester

2013-03-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604777#comment-13604777
 ] 

Robert Muir commented on LUCENE-4845:
-

Wouldnt the straightforward impl be to put the suffixes of the suggestions into 
the FST?

so for this is a test 
you also add is a test, a test, ...

I feel like this could be done with just a tokenfilter used only at build-time 
+ analyzingsuggester, and would be more performant.

 Add AnalyzingInfixSuggester
 ---

 Key: LUCENE-4845
 URL: https://issues.apache.org/jira/browse/LUCENE-4845
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: infixSuggest.png, LUCENE-4845.patch


 Our current suggester impls do prefix matching of the incoming text
 against all compiled suggestions, but in some cases it's useful to
 allow infix matching.  E.g, Netflix does infix suggestions in their
 search box.
 I did a straightforward impl, just using a normal Lucene index, and
 using PostingsHighlighter to highlight matching tokens in the
 suggestions.
 I think this likely only works well when your suggestions have a
 strong prior ranking (weight input to build), eg Netflix knows
 the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4752:
-

Attachment: LUCENE-4752.patch

bq. But, since LTC is quite big, perhaps we can move these methods to a util, 
e.g. CompareIndexes?

Why is the size of the class a concern? I think it's more convenient to have 
all assert*Equals methods in the same class? (LuceneTestCase already has many 
assert*Equals methods inherited from Assert.) And it makes these methods easier 
to find when writing a test?

bq. Can we make OneMerge.readers private and add OneMerge.add(AtomicReader) for 
IW to use? It looks odd that IW manipulates OneMerge.readers directly, but then 
calls OneMerge.getMergeReaders()

I think it would be odd if getMergeReaders was just a getter but it is more 
than that since it filters out empty readers and can even return an arbitrary 
view over the readers to merge. But here it is just a method that computes data 
based on the class members, like segString?

bq. Can we remove SegmentMerger.add()

Good point, I updated the patch.

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch, LUCENE-4752.patch, 
 LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4601) A Collection that is only partially created and then deleted will leave pre allocated shard information in ZooKeeper.

2013-03-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604779#comment-13604779
 ] 

Mark Miller commented on SOLR-4601:
---

The overseer should remove all collection information after this operation, to 
cover any shards that were not up (and may never come back) as well as any pre 
allocated shards that failed to create.

 A Collection that is only partially created and then deleted will leave pre 
 allocated shard information in ZooKeeper.
 -

 Key: SOLR-4601
 URL: https://issues.apache.org/jira/browse/SOLR-4601
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0


 This means you can't try and create the collection again as it will appear to 
 already exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4846) PostingsHighlighter should allow [expert] customization on how the field values are loaded

2013-03-17 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4846:
--

 Summary: PostingsHighlighter should allow [expert] customization 
on how the field values are loaded
 Key: LUCENE-4846
 URL: https://issues.apache.org/jira/browse/LUCENE-4846
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3
 Attachments: LUCENE-4846.patch

Today it always loads from stored fields (searcher.doc), but it's sometimes
useful to customize this, eg if your app separately already loads stored
fields then it can avoid double-loading them.  Or if your app has some other
place to pull the values from ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4846) PostingsHighlighter should allow [expert] customization on how the field values are loaded

2013-03-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4846:
---

Attachment: LUCENE-4846.patch

Simple patch: I added a protected method (loadFieldValues) to PH and a testcase.

 PostingsHighlighter should allow [expert] customization on how the field 
 values are loaded
 --

 Key: LUCENE-4846
 URL: https://issues.apache.org/jira/browse/LUCENE-4846
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4846.patch


 Today it always loads from stored fields (searcher.doc), but it's sometimes
 useful to customize this, eg if your app separately already loads stored
 fields then it can avoid double-loading them.  Or if your app has some other
 place to pull the values from ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4602) ZkController#unregister should cancel it's election participation before asking the Overseer to delete the SolrCore information.

2013-03-17 Thread Mark Miller (JIRA)
Mark Miller created SOLR-4602:
-

 Summary: ZkController#unregister should cancel it's election 
participation before asking the Overseer to delete the SolrCore information.
 Key: SOLR-4602
 URL: https://issues.apache.org/jira/browse/SOLR-4602
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0


Otherwise, the leader election is likely to do publishes that race with the 
removal of the SolrCore from the clusterstate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4752) Merge segments to sort them

2013-03-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604782#comment-13604782
 ] 

Michael McCandless commented on LUCENE-4752:


This approach looks nice: it's minimally invasive; MP just overrides 
OneMerge.getMergeReaders.

And it's really nice to have those asserts more accessible: various tests have 
their own [different!] impls for these equality tests.

 Merge segments to sort them
 ---

 Key: LUCENE-4752
 URL: https://issues.apache.org/jira/browse/LUCENE-4752
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: David Smiley
Assignee: Adrien Grand
 Attachments: LUCENE-4752.patch, LUCENE-4752.patch, LUCENE-4752.patch, 
 LUCENE-4752.patch


 It would be awesome if Lucene could write the documents out in a segment 
 based on a configurable order.  This of course applies to merging segments 
 to. The benefit is increased locality on disk of documents that are likely to 
 be accessed together.  This often applies to documents near each other in 
 time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-4.x-java7 - Build # 1078 - Failure

2013-03-17 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-java7/1078/

1 tests failed.
REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains

Error Message:
finalOffset  expected:20 but was:2

Stack Trace:
java.lang.AssertionError: finalOffset  expected:20 but was:2
at 
__randomizedtesting.SeedInfo.seed([DDEA6102E07059C9:E00B4863A7624409]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:245)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:780)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:956)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
  

[jira] [Created] (LUCENE-4847) Sorter API: Fully reuse docs enums

2013-03-17 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-4847:


 Summary: Sorter API: Fully reuse docs enums
 Key: LUCENE-4847
 URL: https://issues.apache.org/jira/browse/LUCENE-4847
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.3


SortingAtomicReader reuses the filtered docs enums but not the wrapper. In the 
case of SortingAtomicReader this can be a problem because the wrappers are 
heavyweight (they load the whole postings list into memory), so an index with 
many terms with high freqs will make the JVM allocate a lot of memory when 
browsing the postings lists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4601) A Collection that is only partially created and then deleted will leave pre allocated shard information in ZooKeeper.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604788#comment-13604788
 ] 

Commit Tag Bot commented on SOLR-4601:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457585

SOLR-4601: A Collection that is only partially created and then deleted will 
leave pre allocated shard information in ZooKeeper.


 A Collection that is only partially created and then deleted will leave pre 
 allocated shard information in ZooKeeper.
 -

 Key: SOLR-4601
 URL: https://issues.apache.org/jira/browse/SOLR-4601
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.3, 5.0


 This means you can't try and create the collection again as it will appear to 
 already exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4602) ZkController#unregister should cancel it's election participation before asking the Overseer to delete the SolrCore information.

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604789#comment-13604789
 ] 

Commit Tag Bot commented on SOLR-4602:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1457584

SOLR-4602: ZkController#unregister should cancel it's election participation 
before asking the Overseer to delete the SolrCore information.


 ZkController#unregister should cancel it's election participation before 
 asking the Overseer to delete the SolrCore information.
 

 Key: SOLR-4602
 URL: https://issues.apache.org/jira/browse/SOLR-4602
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.3, 5.0


 Otherwise, the leader election is likely to do publishes that race with the 
 removal of the SolrCore from the clusterstate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604791#comment-13604791
 ] 

Commit Tag Bot commented on LUCENE-4843:


[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1457586

LUCENE-4843: In TestRandomChains, only use LimitTokenPositionFilter configured 
to consume all wrapped stream's tokens, just like LimitTokenCountFilter


 LimitTokenPositionFilter: don't emit tokens with positions that exceed the 
 configured limit
 ---

 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4843.patch


 Like LimitTokenCountFilter, except it's the token position that's limited 
 rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-Tests-4.x-java7 - Build # 1078 - Failure

2013-03-17 Thread Steve Rowe
This was caused by the new LimitTokenPositionFilter, which triggers failures 
when it doesn't consume all of its wrapped stream's tokens, just like 
LimitTokenCountFilter.

I committed a fix to trunk and branch_4x.

Steve

On Mar 17, 2013, at 7:06 PM, Apache Jenkins Server jenk...@builds.apache.org 
wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-java7/1078/
 
 1 tests failed.
 REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains
 
 Error Message:
 finalOffset  expected:20 but was:2
 
 Stack Trace:
 java.lang.AssertionError: finalOffset  expected:20 but was:2
   at 
 __randomizedtesting.SeedInfo.seed([DDEA6102E07059C9:E00B4863A7624409]:0)
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:245)
   at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:780)
   at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546)
   at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447)
   at 
 org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:956)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
   at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
   at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
   at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
   at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
   at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
   at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
   at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
   at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
   at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
   at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
   at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
   at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
   at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
   at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
   at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
   at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
   at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
   at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
   at 
 

[jira] [Commented] (LUCENE-4843) LimitTokenPositionFilter: don't emit tokens with positions that exceed the configured limit

2013-03-17 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604798#comment-13604798
 ] 

Commit Tag Bot commented on LUCENE-4843:


[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1457587

LUCENE-4843: In TestRandomChains, only use LimitTokenPositionFilter configured 
to consume all wrapped stream's tokens, just like LimitTokenCountFilter (merged 
trunk r1457586)


 LimitTokenPositionFilter: don't emit tokens with positions that exceed the 
 configured limit
 ---

 Key: LUCENE-4843
 URL: https://issues.apache.org/jira/browse/LUCENE-4843
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4843.patch


 Like LimitTokenCountFilter, except it's the token position that's limited 
 rather than the token count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene/Solr 4.2.1

2013-03-17 Thread Steve Rowe

On Mar 17, 2013, at 3:17 PM, Uwe Schindler u...@thetaphi.de wrote:
 I think the branch should already be setup for 4.2.1, maybe you only need to 
 raise version numbers.

I'll do this.

Steve
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >