[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed

2012-06-27 Thread Torsten Krah (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Krah updated LUCENE-4165:
-

Attachment: lucene_36.patch
lucene_trunk.patch

Updated patches:

1. removed reader.close() call in readAffixFile() function.
2. Add comment at ctors and arguments to make clear that caller has to close 
the streams and that the ctor does not close them.
3. Test modified to check its actually not closed.
4. Added 2 close calls on the streams in trunk patch for the Test.

 HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
 --

 Key: LUCENE-4165
 URL: https://issues.apache.org/jira/browse/LUCENE-4165
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.6
 Environment: Linux, Java 1.6
Reporter: Torsten Krah
Priority: Minor
 Attachments: lucene_36.patch, lucene_36.patch, lucene_trunk.patch, 
 lucene_trunk.patch


 The HunspellDictionary takes an InputStream for affix file and a List of 
 Streams for dictionaries.
 Javadoc is not clear about i have to close those stream myself or the 
 Dictionary constructor does this already.
 Looking at the code, at least reader.close() is called when the affix file is 
 read via readAffixFile() method (although closing streams is not done in a 
 finally block - so the constructor may fail to do so).
 The readDictionaryFile() method does miss the call to close the reader in 
 contrast to readAffixFile().
 So the question here is - have i have to close the streams myself after 
 instantiating the dictionary?
 Or is the close call only missing for the dictionary streams?
 Either way, please add the close calls in a safe manner or clarify javadoc so 
 i have to do this myself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed

2012-06-27 Thread Torsten Krah (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Krah updated LUCENE-4165:
-

Attachment: (was: lucene_trunk.patch)

 HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
 --

 Key: LUCENE-4165
 URL: https://issues.apache.org/jira/browse/LUCENE-4165
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.6
 Environment: Linux, Java 1.6
Reporter: Torsten Krah
Priority: Minor
 Attachments: lucene_36.patch, lucene_trunk.patch


 The HunspellDictionary takes an InputStream for affix file and a List of 
 Streams for dictionaries.
 Javadoc is not clear about i have to close those stream myself or the 
 Dictionary constructor does this already.
 Looking at the code, at least reader.close() is called when the affix file is 
 read via readAffixFile() method (although closing streams is not done in a 
 finally block - so the constructor may fail to do so).
 The readDictionaryFile() method does miss the call to close the reader in 
 contrast to readAffixFile().
 So the question here is - have i have to close the streams myself after 
 instantiating the dictionary?
 Or is the close call only missing for the dictionary streams?
 Either way, please add the close calls in a safe manner or clarify javadoc so 
 i have to do this myself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed

2012-06-27 Thread Torsten Krah (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Krah updated LUCENE-4165:
-

Attachment: (was: lucene_36.patch)

 HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
 --

 Key: LUCENE-4165
 URL: https://issues.apache.org/jira/browse/LUCENE-4165
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.6
 Environment: Linux, Java 1.6
Reporter: Torsten Krah
Priority: Minor
 Attachments: lucene_36.patch, lucene_trunk.patch


 The HunspellDictionary takes an InputStream for affix file and a List of 
 Streams for dictionaries.
 Javadoc is not clear about i have to close those stream myself or the 
 Dictionary constructor does this already.
 Looking at the code, at least reader.close() is called when the affix file is 
 read via readAffixFile() method (although closing streams is not done in a 
 finally block - so the constructor may fail to do so).
 The readDictionaryFile() method does miss the call to close the reader in 
 contrast to readAffixFile().
 So the question here is - have i have to close the streams myself after 
 instantiating the dictionary?
 Or is the close call only missing for the dictionary streams?
 Either way, please add the close calls in a safe manner or clarify javadoc so 
 i have to do this myself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3

2012-06-27 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4138:


Attachment: LUCENE-4138.patch

Updated patch. Not backwards compatible (intentially): 
MorphosyntacticTagAttribute has been renamed to MorphosyntacticTagsAttribute 
(note plural) and now carries a list of tags for the current stem.

 Update morfologik (polish stemming) to 1.5.3
 

 Key: LUCENE-4138
 URL: https://issues.apache.org/jira/browse/LUCENE-4138
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4138.patch, LUCENE-4138.patch


 Just released. Updates to the dictionary but most of all -- it comes with a 
 clean BSD license (including dictionary data).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4157) Improve Spatial Testing

2012-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402063#comment-13402063
 ] 

Chris Male commented on LUCENE-4157:


{quote}RE QuadPrefixTree, I'll see if I can reproduce your test errors. I'm not 
surprised if the QuadPrefixTree.MAX_LEVELS_POSSIBLE is perhaps too big (notice 
the comment at it's declaration not really sure how big this should be. 
Assuming the default 12 levels pass, I think we can find a safer max number to 
use for the time being that is less than 50, and maybe one day when we have 
time we can confidently determine exactly what it can support. I venture to 
guess it might be similar to the mantissa of a double which is 53, but perhaps 
not or maybe it's half that or something. FYI about 26 is needed for ~1meter 
accuracy. If a non-geo scenario is needed, then who knows what your 
requirements might be.{quote}

Thanks for that explanation.  I tried with the default of 12 and the tests 
still failed but no error this time.  That could be just related to the fact 
quad trees are less precise than geohashes or maybe some problems with the 
tests.  I think we should just try to come up with some tests for the trees 
themselves to verify that they work as expected.  I see SpatialPrefixTreeTest 
does some testing of GeohashPrefixTree currently, but we should really spin 
that off into its own test class and take QuadTree separately.

{quote}RE Testing of TermQueryPrefixGridStrategy, I agree that its tests are 
too minimal, in Lucene spatial. FWIW, I'm about to update a patch to SOLR-3304 
that tests a variety of strategies against the same test code (based on test 
code from Solr 3 spatial filter tests). TermQueryPrefixGridStrategy passes 
fine.{quote}

Good to know.  I have confidence in TermQueryPrefixGridStrategy since it is 
extremely simple but I think we need to come up with tests to ensure that any 
changes we make to the indexing process is compatible with the querying.

{quote}I definitely welcome any input on making the tests better overall. It's 
a bit of a challenge because there are a variety of strategies, and some like 
TwoDoublesStrategy are known to not yet support certain geo cases like the 
poles (if I recall). I'm not sure if the idea of a test file of query cases was 
your idea or Ryan's (e.g. cities-IsWithin-BBox), but instead or in addition, I 
like the idea of automatically generating random data and queries, and then 
double checking search results against a simple brute force algorithm.{quote}

I don't really like the test file idea at all.  Having them for benchmarking is 
good but we aren't at that stage yet.  Instead I think we should construct 
simple unit tests, indexing a few Shapes and querying for them.  We should do 
that for each Strategy, obviously only indexing Points for TwoDoublesStrategy.  
Having random data and query generation can come later, once we have enough 
crafted tests to be sure that this works.

We should then randomize the use of QuadTree vs GeohashTree or actually repeat 
the tests for both.

We have a big question mark around testing with polygons.  My concern is that 
users will rightly start using JTS Geometrys and our Strategies will fail.  We 
really need to think about how to handle this.

{quote}If you don't feel any better about these two classes, then I like your 
suggestion of not releasing them in 4.0 and leave in trunk.{quote}

QuadTree is my main concern since I don't know whether it is working correctly 
and is just less precise than geohashes or has a bug.  If we can't quickly come 
up with a couple of tests and fix any broken behavior then we should remove it 
from 4.0.  

We should also take this opportunity to remove any unused code / code that 
doesn't actually test anything.  For this I see TruncateFilter, the current 
TestTermQueryPrefixGridStrategy and TestSpatialPrefixField.

I'll try to help out here especially with cleaning out the dead code, but any 
help with testing QuadTree would be great.




 Improve Spatial Testing
 ---

 Key: LUCENE-4157
 URL: https://issues.apache.org/jira/browse/LUCENE-4157
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Critical
 Fix For: 4.0

 Attachments: LUCENE-4157_Improve_Lucene_Spatial_testing_p1.patch


 Looking back at the tests for the Lucene Spatial Module, they seem 
 half-baked.  (At least Spatial4j is well tested).  I've started working on 
 some improvements:
 * Some tests are in an abstract base class which have a subclass that 
 provides a SpatialContext. The idea was that the same tests could test other 
 contexts (such as geo vs not or different distance calculators (haversine vs 
 vincenty) but this 

[jira] [Commented] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3

2012-06-27 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402069#comment-13402069
 ] 

Dawid Weiss commented on LUCENE-4138:
-

If there are no objections I'll commit this shortly.

 Update morfologik (polish stemming) to 1.5.3
 

 Key: LUCENE-4138
 URL: https://issues.apache.org/jira/browse/LUCENE-4138
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4138.patch, LUCENE-4138.patch, LUCENE-4138.patch


 Just released. Updates to the dictionary but most of all -- it comes with a 
 clean BSD license (including dictionary data).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3

2012-06-27 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4138:


Attachment: LUCENE-4138.patch

Updated patch with minor fixes (corrected module fileset, optimized buffer 
reuse for tags).

 Update morfologik (polish stemming) to 1.5.3
 

 Key: LUCENE-4138
 URL: https://issues.apache.org/jira/browse/LUCENE-4138
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4138.patch, LUCENE-4138.patch, LUCENE-4138.patch


 Just released. Updates to the dictionary but most of all -- it comes with a 
 clean BSD license (including dictionary data).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3

2012-06-27 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-4138:


Priority: Minor  (was: Trivial)

 Update morfologik (polish stemming) to 1.5.3
 

 Key: LUCENE-4138
 URL: https://issues.apache.org/jira/browse/LUCENE-4138
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4138.patch, LUCENE-4138.patch, LUCENE-4138.patch


 Just released. Updates to the dictionary but most of all -- it comes with a 
 clean BSD license (including dictionary data).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402075#comment-13402075
 ] 

Adrien Grand commented on LUCENE-4062:
--

Thanks for your patch, Toke. All tests seem to pass, I'll try to generate 
graphs for your impl as soon as possible!

 More fine-grained control over the packed integer implementation that is 
 chosen
 ---

 Key: LUCENE-4062
 URL: https://issues.apache.org/jira/browse/LUCENE-4062
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
  Labels: performance
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, 
 PackedIntsBenchmark.java, PackedIntsBenchmark.java


 In order to save space, Lucene has two main PackedInts.Mutable implentations, 
 one that is very fast and is based on a byte/short/integer/long array 
 (Direct*) and another one which packs bits in a memory-efficient manner 
 (Packed*).
 The packed implementation tends to be much slower than the direct one, which 
 discourages some Lucene components to use it. On the other hand, if you store 
 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
 If you accept to trade some space for speed, you could store 3 of these 21 
 bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
 advantage of this approach is that you never need to read more than one block 
 to read or write a value, so this can be significantly faster than Packed32 
 and Packed64 which always need to read/write two blocks in order to avoid 
 costly branches.
 I ran some tests, and for 1000 21 bits values, this implementation takes 
 less than 2% more space and has 44% faster writes and 30% faster reads. The 
 12 bits version (5 values per block) has the same performance improvement and 
 a 6% memory overhead compared to the packed implementation.
 In order to select the best implementation for a given integer size, I wrote 
 the {{PackedInts.getMutable(valueCount, bitsPerValue, 
 acceptableOverheadPerValue)}} method. This method select the fastest 
 implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
 per value. For example, if you accept an overhead of 20% 
 ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
 reasonable, here is what implementations would be selected:
  * 1: Packed64SingleBlock1
  * 2: Packed64SingleBlock2
  * 3: Packed64SingleBlock3
  * 4: Packed64SingleBlock4
  * 5: Packed64SingleBlock5
  * 6: Packed64SingleBlock6
  * 7: Direct8
  * 8: Direct8
  * 9: Packed64SingleBlock9
  * 10: Packed64SingleBlock10
  * 11: Packed64SingleBlock12
  * 12: Packed64SingleBlock12
  * 13: Packed64
  * 14: Direct16
  * 15: Direct16
  * 16: Direct16
  * 17: Packed64
  * 18: Packed64SingleBlock21
  * 19: Packed64SingleBlock21
  * 20: Packed64SingleBlock21
  * 21: Packed64SingleBlock21
  * 22: Packed64
  * 23: Packed64
  * 24: Packed64
  * 25: Packed64
  * 26: Packed64
  * 27: Direct32
  * 28: Direct32
  * 29: Direct32
  * 30: Direct32
  * 31: Direct32
  * 32: Direct32
  * 33: Packed64
  * 34: Packed64
  * 35: Packed64
  * 36: Packed64
  * 37: Packed64
  * 38: Packed64
  * 39: Packed64
  * 40: Packed64
  * 41: Packed64
  * 42: Packed64
  * 43: Packed64
  * 44: Packed64
  * 45: Packed64
  * 46: Packed64
  * 47: Packed64
  * 48: Packed64
  * 49: Packed64
  * 50: Packed64
  * 51: Packed64
  * 52: Packed64
  * 53: Packed64
  * 54: Direct64
  * 55: Direct64
  * 56: Direct64
  * 57: Direct64
  * 58: Direct64
  * 59: Direct64
  * 60: Direct64
  * 61: Direct64
  * 62: Direct64
 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
 choose the slower Packed64 implementation. Allowing a 50% overhead would 
 prevent the packed implementation to be selected for bits per value under 32. 
 Allowing an overhead of 32 bits per value would make sure that a Direct* 
 implementation is always selected.
 Next steps would be to:
  * make lucene components use this {{getMutable}} method and let users decide 
 what trade-off better suits them,
  * write a Packed32SingleBlock implementation if necessary (I didn't do it 
 because I have no 32-bits computer to test the performance improvements).
 I think this would allow more fine-grained control over the speed/space 
 trade-off, what do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa

[jira] [Updated] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-27 Thread Toke Eskildsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toke Eskildsen updated LUCENE-4062:
---

Attachment: measurements_te_xeon.txt
measurements_te_p4.txt
measurements_te_i7.txt
measurements_te_graphs.pdf

I ran the test on three different machines. results are attached as 
measurements*.txt along with a PDF with graphs generated from iteration #6 
(which should probably be the mean or max of run 2-5). The setter-graph for the 
p4 looks extremely strange for Direct, but I tried generating a graph for 
iteration #5 instead and it looked the same. in the same vein, the Direct 
performance for the Xeon is suspiciously low, so I wonder if there's some 
freaky JITting happening to the test code.

Unfortunately I did not find an AMD machine to test on. For the three tested 
Intels, it seems that the Packed64calc does perform very well.

 More fine-grained control over the packed integer implementation that is 
 chosen
 ---

 Key: LUCENE-4062
 URL: https://issues.apache.org/jira/browse/LUCENE-4062
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
  Labels: performance
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, 
 PackedIntsBenchmark.java, PackedIntsBenchmark.java, 
 measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, 
 measurements_te_xeon.txt


 In order to save space, Lucene has two main PackedInts.Mutable implentations, 
 one that is very fast and is based on a byte/short/integer/long array 
 (Direct*) and another one which packs bits in a memory-efficient manner 
 (Packed*).
 The packed implementation tends to be much slower than the direct one, which 
 discourages some Lucene components to use it. On the other hand, if you store 
 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
 If you accept to trade some space for speed, you could store 3 of these 21 
 bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
 advantage of this approach is that you never need to read more than one block 
 to read or write a value, so this can be significantly faster than Packed32 
 and Packed64 which always need to read/write two blocks in order to avoid 
 costly branches.
 I ran some tests, and for 1000 21 bits values, this implementation takes 
 less than 2% more space and has 44% faster writes and 30% faster reads. The 
 12 bits version (5 values per block) has the same performance improvement and 
 a 6% memory overhead compared to the packed implementation.
 In order to select the best implementation for a given integer size, I wrote 
 the {{PackedInts.getMutable(valueCount, bitsPerValue, 
 acceptableOverheadPerValue)}} method. This method select the fastest 
 implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
 per value. For example, if you accept an overhead of 20% 
 ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
 reasonable, here is what implementations would be selected:
  * 1: Packed64SingleBlock1
  * 2: Packed64SingleBlock2
  * 3: Packed64SingleBlock3
  * 4: Packed64SingleBlock4
  * 5: Packed64SingleBlock5
  * 6: Packed64SingleBlock6
  * 7: Direct8
  * 8: Direct8
  * 9: Packed64SingleBlock9
  * 10: Packed64SingleBlock10
  * 11: Packed64SingleBlock12
  * 12: Packed64SingleBlock12
  * 13: Packed64
  * 14: Direct16
  * 15: Direct16
  * 16: Direct16
  * 17: Packed64
  * 18: Packed64SingleBlock21
  * 19: Packed64SingleBlock21
  * 20: Packed64SingleBlock21
  * 21: Packed64SingleBlock21
  * 22: Packed64
  * 23: Packed64
  * 24: Packed64
  * 25: Packed64
  * 26: Packed64
  * 27: Direct32
  * 28: Direct32
  * 29: Direct32
  * 30: Direct32
  * 31: Direct32
  * 32: Direct32
  * 33: Packed64
  * 34: Packed64
  * 35: Packed64
  * 36: Packed64
  * 37: Packed64
  * 38: Packed64
  * 39: Packed64
  * 40: Packed64
  * 41: Packed64
  * 42: Packed64
  * 43: Packed64
  * 44: Packed64
  * 45: Packed64
  * 46: Packed64
  * 47: Packed64
  * 48: Packed64
  * 49: Packed64
  * 50: Packed64
  * 51: Packed64
  * 52: Packed64
  * 53: Packed64
  * 54: Direct64
  * 55: Direct64
  * 56: Direct64
  * 57: Direct64
  * 58: Direct64
  * 59: Direct64
  * 60: Direct64
  * 61: Direct64
  * 62: Direct64
 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
 choose the slower Packed64 implementation. Allowing a 50% overhead would 
 prevent the packed implementation to be selected for bits 

Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Antoine LE FLOC'H
I was actually using a solrconfig.xml that is too old for this version.

catalina.out  gave me some erros on indexDefaults and mainIndex, so I
took the solrconfig.xml from your alpha package and it worked fine.

I haven't been able to totally check everything yet because I am using a
Solrj 3.6 indexing client and I had some content type issues in
catalina.out. I am working on it.


On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote:
  that seems worth an issue. is there one already?

 not yet, there was one comment on SOLR-3238 but no further comment.


 On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote:
  Are you aware of this error ? Thanks again.



 Antoine, would you mind to open one and provide some infos? This error
 will show up if there's a problem accessing /admin/system, i guess that
 should be our starting point




 Stefan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Assigned] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-3467:
-

Assignee: Jan Høydahl

 ExtendedDismax escaping is missing several reserved characters
 --

 Key: SOLR-3467
 URL: https://issues.apache.org/jira/browse/SOLR-3467
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Michael Dodsworth
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3467.patch, SOLR-3467.patch


 When edismax is unable to parse the original user query, it retries using an 
 escaped version of that query (where all reserved chars have been escaped).
 Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
 several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
 '', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3467:
--

Affects Version/s: (was: 4.0)
   3.6
Fix Version/s: 5.0

 ExtendedDismax escaping is missing several reserved characters
 --

 Key: SOLR-3467
 URL: https://issues.apache.org/jira/browse/SOLR-3467
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Michael Dodsworth
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3467.patch, SOLR-3467.patch


 When edismax is unable to parse the original user query, it retries using an 
 escaped version of that query (where all reserved chars have been escaped).
 Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
 several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
 '', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



version checkout

2012-06-27 Thread Despot Jakimovski
Hi,

I created an issue in JIRA
https://issues.apache.org/jira/browse/SOLR-3574and now I want to
develop/contribute. I would first like to create a patch
for the Solr version 3.6.0, than also include a patch for versions 4 and 5.
Is this possible or can I only create patch for last version?

As much as I read from How To
Contributehttp://wiki.apache.org/solr/HowToContributeand from what I
can see from
here https://svn.apache.org/repos/asf/lucene/dev/, there are trunk, tags,
branches and nightly available. What version is the trunk? I think I
shouldn't touch tags (those are final versions), nor branches (these is
big functionality branched code which might differ from the trunk). Can
someone please help me get up to speed with this?

Cheers,
despot


Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Antoine LE FLOC'H
Using SolrJ from the alpha, everything works. Go for it !


On Wed, Jun 27, 2012 at 12:03 PM, Antoine LE FLOC'H lefl...@gmail.comwrote:

 I was actually using a solrconfig.xml that is too old for this version.

 catalina.out  gave me some erros on indexDefaults and mainIndex, so I
 took the solrconfig.xml from your alpha package and it worked fine.

 I haven't been able to totally check everything yet because I am using a
 Solrj 3.6 indexing client and I had some content type issues in
 catalina.out. I am working on it.



 On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis 
 matheis.ste...@googlemail.com wrote:

 On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote:
  that seems worth an issue. is there one already?

 not yet, there was one comment on SOLR-3238 but no further comment.


 On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote:
  Are you aware of this error ? Thanks again.



 Antoine, would you mind to open one and provide some infos? This error
 will show up if there's a problem accessing /admin/system, i guess that
 should be our starting point




 Stefan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





Re: post alpha 1 entries in CHANGES.txt

2012-06-27 Thread Jan Høydahl
A lightweight method is to create a TAG in SVN for each RC and also for ALPHA 
and BETA, and then cut a branch when releasing 4.0.0
Regarding CHANGES, it should have full-blown sections for 4.0.0-ALPHA and 
4.0.0-BETA, but for e.g. 4.0.0-BETA-RC2 it should be enough with a marker line 
in CHANGES.TXT to indicate which issues are included before and after the RC. 
That way we can build and tag RC's often with little effort, and when the final 
release is done, these marker lines can be removed if we wish.

For ALPHA/BETA I think we should also add a new section to CHANGES.TXT to 
highlight known major issues which are still blockers for final release.

This is how it could look like:


==  4.0.0-ALPHA ==
:
:
IMPORTANT: This is not a final release and we encourage you to use the latest 
stable release for production use.

Known critical issues in this ALPHA
---

* SOLR-: Index gets corrupted every monday :-)

Detailed Change List
--

New Features
--

* SOLR-: Foo bar (myself)

 4.0.0-ALPHA-RC1 includes changes above this line 

* SOLR-: Foo bar (myself)

* SOLR-: Foo bar (myself)

 4.0.0-ALPHA-RC2 includes changes above this line 

* SOLR-: Foo bar (myself)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 27. juni 2012, at 04:10, Mark Miller wrote:

 I put a new changes entry after the alpha under the 4-Alpha release in 
 CHANGES.txt for Solr.
 
 I missed the discussion if there was one, but if we plan to have a CHANGES 
 section for alphas and betas, let me know, and I'll move that entry when we 
 start a new section.
 
 We should add the next section soon if we are going to so it's clear what 
 direction we are taking. 
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3467:
--

Attachment: SOLR-3467.patch

Updated trunk patch with extended test and CHANGES entry. Looks good to me. Any 
other comments before commit?

 ExtendedDismax escaping is missing several reserved characters
 --

 Key: SOLR-3467
 URL: https://issues.apache.org/jira/browse/SOLR-3467
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Michael Dodsworth
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3467.patch, SOLR-3467.patch, SOLR-3467.patch


 When edismax is unable to parse the original user query, it retries using an 
 escaped version of that query (where all reserved chars have been escaped).
 Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
 several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
 '', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: version checkout

2012-06-27 Thread Tomás Fernández Löbbe
Hi, currently the trunk is going to be the version 5.0. The version 4.0
hasn't been released yet, but there is a branch (branch_4x) created for it.
There is also a branch for 3.6.1 (lucene_solr_3_6 I think) that's only for
bug fixes.

The general way of working is providing patches for the different versions
where the patch should be applied. You almost always want to apply the
patch to the trunk. Some of them should also be applied to 4.0 (everything
but big changes I would say) and bug fixes to 3.6.

Tomás

On Wed, Jun 27, 2012 at 7:12 AM, Despot Jakimovski 
despot.jakimov...@gmail.com wrote:

 Hi,

 I created an issue in 
 JIRAhttps://issues.apache.org/jira/browse/SOLR-3574and now I want to 
 develop/contribute. I would first like to create a patch
 for the Solr version 3.6.0, than also include a patch for versions 4 and 5.
 Is this possible or can I only create patch for last version?

 As much as I read from How To 
 Contributehttp://wiki.apache.org/solr/HowToContributeand from what I can 
 see from
 here https://svn.apache.org/repos/asf/lucene/dev/, there are trunk,
 tags, branches and nightly available. What version is the trunk? I think I
 shouldn't touch tags (those are final versions), nor branches (these is
 big functionality branched code which might differ from the trunk). Can
 someone please help me get up to speed with this?

 Cheers,
 despot



[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3467:
--

Fix Version/s: 3.6.1

 ExtendedDismax escaping is missing several reserved characters
 --

 Key: SOLR-3467
 URL: https://issues.apache.org/jira/browse/SOLR-3467
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Michael Dodsworth
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 3.6.1, 5.0

 Attachments: SOLR-3467-lucene_solr_3_6.patch, SOLR-3467.patch, 
 SOLR-3467.patch, SOLR-3467.patch


 When edismax is unable to parse the original user query, it retries using an 
 escaped version of that query (where all reserved chars have been escaped).
 Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
 several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
 '', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3467:
--

Attachment: SOLR-3467-lucene_solr_3_6.patch

This was an easy backport for 3.6.1

 ExtendedDismax escaping is missing several reserved characters
 --

 Key: SOLR-3467
 URL: https://issues.apache.org/jira/browse/SOLR-3467
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Michael Dodsworth
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 3.6.1, 5.0

 Attachments: SOLR-3467-lucene_solr_3_6.patch, SOLR-3467.patch, 
 SOLR-3467.patch, SOLR-3467.patch


 When edismax is unable to parse the original user query, it retries using an 
 escaped version of that query (where all reserved chars have been escaped).
 Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
 several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
 '', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4166) TwoDoublesStrategy is broken for Circles

2012-06-27 Thread Chris Male (JIRA)
Chris Male created LUCENE-4166:
--

 Summary: TwoDoublesStrategy is broken for Circles
 Key: LUCENE-4166
 URL: https://issues.apache.org/jira/browse/LUCENE-4166
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Priority: Critical


TwoDoublesStrategy supports finding Documents that are within a Circle, yet it 
is impossible to provide one due to the following code found at the start of 
TwoDoublesStrategy.makeQuery():

{code}
Shape shape = args.getShape();
if (!(shape instanceof Rectangle)) {
  throw new InvalidShapeException(A rectangle is the only supported shape 
(so far), not +shape.getClass());//TODO
}
Rectangle bbox = (Rectangle) shape;
{code}

I think instead the code which handles Circles should ask for the bounding box 
of the Shape and uses that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread Chris Male (JIRA)
Chris Male created LUCENE-4167:
--

 Summary: Remove the use of SpatialOperation
 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male


Looking at the code in TwoDoublesStrategy I noticed 
SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
other Strategys I see that really only isWithin and Intersects is supported.  
Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
SpatialOperations are not supported.

I don't think we should use SpatialOperation as this stage since it is not 
clear what Operations are supported by what Strategys, many Operations are not 
supported, and the code for handling the Operations is usually the same.  We 
can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3581) Search Component configuration via solrconfig.xml is not working

2012-06-27 Thread Karl Wright (JIRA)
Karl Wright created SOLR-3581:
-

 Summary: Search Component configuration via solrconfig.xml is not 
working
 Key: SOLR-3581
 URL: https://issues.apache.org/jira/browse/SOLR-3581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Checkout and build of branches/branch_4x
Reporter: Karl Wright


See CONNECTORS-485.  ManifoldCF search component tests that pass on 3.6 and 
used to pass on 4.0 fail on branches_4x Solr because the configuration 
information from solrconfig.xml is not being properly passed to the search 
component via the init() method.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1763) Integrate Solr Cell/Tika as an UpdateRequestProcessor

2012-06-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402149#comment-13402149
 ] 

Jan Høydahl commented on SOLR-1763:
---

I won't have time to look at this before october-ish, so anyone feel free to 
give it a shot :)

 Integrate Solr Cell/Tika as an UpdateRequestProcessor
 -

 Key: SOLR-1763
 URL: https://issues.apache.org/jira/browse/SOLR-1763
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
  Labels: extracting_request_handler, solr_cell, tika, 
 update_request_handler

 From Chris Hostetter's original post in solr-dev:
 As someone with very little knowledge of Solr Cell and/or Tika, I find myself 
 wondering if ExtractingRequestHandler would make more sense as an 
 extractingUpdateProcessor -- where it could be configured to take take either 
 binary fields (or string fields containing URLs) out of the Documents, parse 
 them with tika, and add the various XPath matching hunks of text back into 
 the document as new fields.
 Then ExtractingRequestHandler just becomes a handler that slurps up it's 
 ContentStreams and adds them as binary data fields and adds the other literal 
 params as fields.
 Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths 
 in XML and CSV based updates fairly trivial?
 -Hoss
 I couldn't agree more, so I decided to add it as an issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3581) Search Component configuration via solrconfig.xml is not working

2012-06-27 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated SOLR-3581:
--

Description: 
See CONNECTORS-485.  ManifoldCF search component tests that pass on 3.6 and 
used to pass on trunk fail on branches_4x Solr because the configuration 
information from solrconfig.xml is not being properly passed to the search 
component via the init() method.


  was:
See CONNECTORS-485.  ManifoldCF search component tests that pass on 3.6 and 
used to pass on 4.0 fail on branches_4x Solr because the configuration 
information from solrconfig.xml is not being properly passed to the search 
component via the init() method.



 Search Component configuration via solrconfig.xml is not working
 

 Key: SOLR-3581
 URL: https://issues.apache.org/jira/browse/SOLR-3581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Checkout and build of branches/branch_4x
Reporter: Karl Wright

 See CONNECTORS-485.  ManifoldCF search component tests that pass on 3.6 and 
 used to pass on trunk fail on branches_4x Solr because the configuration 
 information from solrconfig.xml is not being properly passed to the search 
 component via the init() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3581) Search Component configuration via solrconfig.xml is not working

2012-06-27 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402173#comment-13402173
 ] 

Karl Wright commented on SOLR-3581:
---

Found the problem; closing this issue.


 Search Component configuration via solrconfig.xml is not working
 

 Key: SOLR-3581
 URL: https://issues.apache.org/jira/browse/SOLR-3581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Checkout and build of branches/branch_4x
Reporter: Karl Wright

 See CONNECTORS-485.  ManifoldCF search component tests that pass on 3.6 and 
 used to pass on trunk fail on branches_4x Solr because the configuration 
 information from solrconfig.xml is not being properly passed to the search 
 component via the init() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-3581) Search Component configuration via solrconfig.xml is not working

2012-06-27 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright closed SOLR-3581.
-

   Resolution: Not A Problem
Fix Version/s: 4.0

Operator error; inadvertant change in the failing test

 Search Component configuration via solrconfig.xml is not working
 

 Key: SOLR-3581
 URL: https://issues.apache.org/jira/browse/SOLR-3581
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Checkout and build of branches/branch_4x
Reporter: Karl Wright
 Fix For: 4.0


 See CONNECTORS-485.  ManifoldCF search component tests that pass on 3.6 and 
 used to pass on trunk fail on branches_4x Solr because the configuration 
 information from solrconfig.xml is not being properly passed to the search 
 component via the init() method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4168) Allow storing test execution statistics in an external file

2012-06-27 Thread Dawid Weiss (JIRA)
Dawid Weiss created LUCENE-4168:
---

 Summary: Allow storing test execution statistics in an external 
file
 Key: LUCENE-4168
 URL: https://issues.apache.org/jira/browse/LUCENE-4168
 Project: Lucene - Java
  Issue Type: Test
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0


Override on the build server to calculate stats during runs, then update the 
cache in the repo from time to time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-1856) In Solr Cell, literals should override Tika-parsed values

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-1856.
---

Resolution: Fixed

Committed to trunk r1354455 and branch_4x r1354460

 In Solr Cell, literals should override Tika-parsed values
 -

 Key: SOLR-1856
 URL: https://issues.apache.org/jira/browse/SOLR-1856
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Chris Harris
Assignee: Jan Høydahl
 Fix For: 4.0, 5.0

 Attachments: SOLR-1856.patch, SOLR-1856.patch


 I propose that ExtractingRequestHandler / SolrCell literals should take 
 precedence over Tika-parsed metadata in all situations, including where 
 multiValued=true. (Compare SOLR-1633?)
 My personal motivation is that I have several fields (e.g. title, date) 
 where my own metadata is much superior to what Tika offers, and I want to 
 throw those Tika values away. (I actually wouldn't mind throwing away _all_ 
 Tika-parsed values, but let's set that aside.) SOLR-1634 is one potential 
 approach to this, but the fix here might be simpler.
 I'll attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-1634) change order of field operations in SolrCell

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed SOLR-1634.
-

Resolution: Duplicate

Marking as duplicate of SOLR-1856 which is fixed.

Also, note that as a workaround this works: 
fmap.title=tika_titleliteral.title=HelloWorld - where the Tika-parsed title 
will first be moved to a new field and then accept the literal one.

 change order of field operations in SolrCell
 

 Key: SOLR-1634
 URL: https://issues.apache.org/jira/browse/SOLR-1634
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Hoss Man

 As noted on the mailing list, SolrCell evaluates fmap.* params AFTER 
 literal.* params.  This makes it impossible for users to map tika produced 
 fields to other names (possibly for the purpose of ignoring them completely) 
 and then using literal to provide explicit values for those fields.  At first 
 glance this seems like a bug, except that it is explicitly documented...
 http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations
 ...so i'm opening this as an Improvement.   We should either consider 
 changing the order of operations, or find some other way to support what 
 seems like a very common use case...
 http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-1633) Solr Cell should be smarter about literal and multiValued=false

2012-06-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed SOLR-1633.
-

Resolution: Duplicate

Solved in SOLR-1856

 Solr Cell should be smarter about literal and multiValued=false
 -

 Key: SOLR-1633
 URL: https://issues.apache.org/jira/browse/SOLR-1633
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Hoss Man

 As noted on solr-user, SolrCell has less then ideal behavior when foo is a 
 single value field, and literal.foo=bar is specified in the request, but Tika 
 also produces a value for the foo field from the document.  It seems like a 
 possible improvement here would be for SolrCell to ignore the value from Tika 
 if it already has one that was explicitly provided (as opposed to the current 
 behavior of letting hte add fail because of multiple values in a single 
 valued field).
 It seems pretty clear that in cases like this, the users intention is to have 
 their one literal field used as the value.
 http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4168) Allow storing test execution statistics in an external file

2012-06-27 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-4168.
-

Resolution: Fixed

 Allow storing test execution statistics in an external file
 ---

 Key: LUCENE-4168
 URL: https://issues.apache.org/jira/browse/LUCENE-4168
 Project: Lucene - Java
  Issue Type: Test
  Components: general/test
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0, 5.0


 Override on the build server to calculate stats during runs, then update the 
 cache in the repo from time to time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: version checkout

2012-06-27 Thread Despot Jakimovski
Thanks a bunch for the input!

So I guess there is no way of me doing a patch for 3.6.0 since the issue in
JIRA https://issues.apache.org/jira/browse/SOLR-3574 I reported is not a
bug (but a new feature). Ok. Than, I'll do a patch for the 4.0
(branch_4xhttps://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/)
and 5.0 (trunk). This probably will require me to change the Affected
version and Fix version on the
issuehttps://issues.apache.org/jira/browse/SOLR-3574
.

Thanks again,
despot

On Wed, Jun 27, 2012 at 12:59 PM, Tomás Fernández Löbbe 
tomasflo...@gmail.com wrote:

 Hi, currently the trunk is going to be the version 5.0. The version 4.0
 hasn't been released yet, but there is a branch (branch_4x) created for it.
 There is also a branch for 3.6.1 (lucene_solr_3_6 I think) that's only for
 bug fixes.

 The general way of working is providing patches for the different versions
 where the patch should be applied. You almost always want to apply the
 patch to the trunk. Some of them should also be applied to 4.0 (everything
 but big changes I would say) and bug fixes to 3.6.

 Tomás


 On Wed, Jun 27, 2012 at 7:12 AM, Despot Jakimovski 
 despot.jakimov...@gmail.com wrote:

 Hi,

 I created an issue in 
 JIRAhttps://issues.apache.org/jira/browse/SOLR-3574and now I want to 
 develop/contribute. I would first like to create a patch
 for the Solr version 3.6.0, than also include a patch for versions 4 and 5.
 Is this possible or can I only create patch for last version?

 As much as I read from How To 
 Contributehttp://wiki.apache.org/solr/HowToContributeand from what I can 
 see from
 here https://svn.apache.org/repos/asf/lucene/dev/, there are trunk,
 tags, branches and nightly available. What version is the trunk? I think I
 shouldn't touch tags (those are final versions), nor branches (these is
 big functionality branched code which might differ from the trunk). Can
 someone please help me get up to speed with this?

 Cheers,
 despot





[jira] [Updated] (LUCENE-4166) TwoDoublesStrategy is broken for Circles

2012-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-4166:
---

Attachment: LUCENE-4166.patch

Simple patch extending runtime type checking to Circle and using the Shape 
bounding box.

 TwoDoublesStrategy is broken for Circles
 

 Key: LUCENE-4166
 URL: https://issues.apache.org/jira/browse/LUCENE-4166
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Priority: Critical
 Attachments: LUCENE-4166.patch


 TwoDoublesStrategy supports finding Documents that are within a Circle, yet 
 it is impossible to provide one due to the following code found at the start 
 of TwoDoublesStrategy.makeQuery():
 {code}
 Shape shape = args.getShape();
 if (!(shape instanceof Rectangle)) {
   throw new InvalidShapeException(A rectangle is the only supported 
 shape (so far), not +shape.getClass());//TODO
 }
 Rectangle bbox = (Rectangle) shape;
 {code}
 I think instead the code which handles Circles should ask for the bounding 
 box of the Shape and uses that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Erick Erickson
But what about feature all my favorites here. Which means I haven't
gotten off my butt and moved them forward. Waiting for the elves isn't
working G

+1, go for it. We can't continue waiting for the elves, they unionized
and are demanding some time off for sleep

Erick

On Wed, Jun 27, 2012 at 6:40 AM, Antoine LE FLOC'H lefl...@gmail.com wrote:

 Using SolrJ from the alpha, everything works. Go for it !



 On Wed, Jun 27, 2012 at 12:03 PM, Antoine LE FLOC'H lefl...@gmail.com
 wrote:

 I was actually using a solrconfig.xml that is too old for this version.

 catalina.out  gave me some erros on indexDefaults and mainIndex, so I
 took the solrconfig.xml from your alpha package and it worked fine.

 I haven't been able to totally check everything yet because I am using a
 Solrj 3.6 indexing client and I had some content type issues in
 catalina.out. I am working on it.



 On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis
 matheis.ste...@googlemail.com wrote:

 On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote:
  that seems worth an issue. is there one already?

 not yet, there was one comment on SOLR-3238 but no further comment.


 On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote:
  Are you aware of this error ? Thanks again.



 Antoine, would you mind to open one and provide some infos? This error
 will show up if there's a problem accessing /admin/system, i guess that
 should be our starting point




 Stefan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402201#comment-13402201
 ] 

Adrien Grand commented on LUCENE-4062:
--

Thanks for sharing your results. Here are mines: 
http://people.apache.org/~jpountz/packed_ints_calc.html (E5500 @ 2.80GHz, java 
1.7.0_02, hotspot build 22.0-b10). Funny to see those little bumps when the 
number of bits per value is 8, 16, 32 or 64 (24 as well, although it is 
smaller)!

It is not clear whether this impl is faster or slower than the single-block 
impl (or even the 3 blocks impl, I am happily surprised by the read throughput 
on the intel 4 machine) depending on the hardware. However, this new impl seems 
to be consistently better than the actual Packed64 class so I think we should 
replace it with your new impl. What do you think? Can you write a patch?

 More fine-grained control over the packed integer implementation that is 
 chosen
 ---

 Key: LUCENE-4062
 URL: https://issues.apache.org/jira/browse/LUCENE-4062
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
  Labels: performance
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, 
 PackedIntsBenchmark.java, PackedIntsBenchmark.java, 
 measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, 
 measurements_te_xeon.txt


 In order to save space, Lucene has two main PackedInts.Mutable implentations, 
 one that is very fast and is based on a byte/short/integer/long array 
 (Direct*) and another one which packs bits in a memory-efficient manner 
 (Packed*).
 The packed implementation tends to be much slower than the direct one, which 
 discourages some Lucene components to use it. On the other hand, if you store 
 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
 If you accept to trade some space for speed, you could store 3 of these 21 
 bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
 advantage of this approach is that you never need to read more than one block 
 to read or write a value, so this can be significantly faster than Packed32 
 and Packed64 which always need to read/write two blocks in order to avoid 
 costly branches.
 I ran some tests, and for 1000 21 bits values, this implementation takes 
 less than 2% more space and has 44% faster writes and 30% faster reads. The 
 12 bits version (5 values per block) has the same performance improvement and 
 a 6% memory overhead compared to the packed implementation.
 In order to select the best implementation for a given integer size, I wrote 
 the {{PackedInts.getMutable(valueCount, bitsPerValue, 
 acceptableOverheadPerValue)}} method. This method select the fastest 
 implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
 per value. For example, if you accept an overhead of 20% 
 ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
 reasonable, here is what implementations would be selected:
  * 1: Packed64SingleBlock1
  * 2: Packed64SingleBlock2
  * 3: Packed64SingleBlock3
  * 4: Packed64SingleBlock4
  * 5: Packed64SingleBlock5
  * 6: Packed64SingleBlock6
  * 7: Direct8
  * 8: Direct8
  * 9: Packed64SingleBlock9
  * 10: Packed64SingleBlock10
  * 11: Packed64SingleBlock12
  * 12: Packed64SingleBlock12
  * 13: Packed64
  * 14: Direct16
  * 15: Direct16
  * 16: Direct16
  * 17: Packed64
  * 18: Packed64SingleBlock21
  * 19: Packed64SingleBlock21
  * 20: Packed64SingleBlock21
  * 21: Packed64SingleBlock21
  * 22: Packed64
  * 23: Packed64
  * 24: Packed64
  * 25: Packed64
  * 26: Packed64
  * 27: Direct32
  * 28: Direct32
  * 29: Direct32
  * 30: Direct32
  * 31: Direct32
  * 32: Direct32
  * 33: Packed64
  * 34: Packed64
  * 35: Packed64
  * 36: Packed64
  * 37: Packed64
  * 38: Packed64
  * 39: Packed64
  * 40: Packed64
  * 41: Packed64
  * 42: Packed64
  * 43: Packed64
  * 44: Packed64
  * 45: Packed64
  * 46: Packed64
  * 47: Packed64
  * 48: Packed64
  * 49: Packed64
  * 50: Packed64
  * 51: Packed64
  * 52: Packed64
  * 53: Packed64
  * 54: Direct64
  * 55: Direct64
  * 56: Direct64
  * 57: Direct64
  * 58: Direct64
  * 59: Direct64
  * 60: Direct64
  * 61: Direct64
  * 62: Direct64
 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
 choose the slower Packed64 implementation. Allowing a 50% overhead would 
 prevent the packed implementation to be selected for bits per value under 32. 
 Allowing an overhead of 32 bits per value 

[jira] [Comment Edited] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402201#comment-13402201
 ] 

Adrien Grand edited comment on LUCENE-4062 at 6/27/12 12:58 PM:


Thanks for sharing your results. Here are mines: 
http://people.apache.org/~jpountz/packed_ints_calc.html (E5500 @ 2.80GHz, java 
1.7.0_02, hotspot build 22.0-b10). Funny to see those little bumps when the 
number of bits per value is 8, 16, 32 or 64 (24 as well, although it is 
smaller)!

It is not clear whether this impl is faster or slower than the single-block 
impl (or even the 3 blocks impl, I am happily surprised by the read throughput 
on the intel 4 machine) depending on the hardware. However, this new impl seems 
to be consistently better than the current Packed64 class so I think we should 
replace it with your new impl. What do you think? Can you write a patch?

  was (Author: jpountz):
Thanks for sharing your results. Here are mines: 
http://people.apache.org/~jpountz/packed_ints_calc.html (E5500 @ 2.80GHz, java 
1.7.0_02, hotspot build 22.0-b10). Funny to see those little bumps when the 
number of bits per value is 8, 16, 32 or 64 (24 as well, although it is 
smaller)!

It is not clear whether this impl is faster or slower than the single-block 
impl (or even the 3 blocks impl, I am happily surprised by the read throughput 
on the intel 4 machine) depending on the hardware. However, this new impl seems 
to be consistently better than the actual Packed64 class so I think we should 
replace it with your new impl. What do you think? Can you write a patch?
  
 More fine-grained control over the packed integer implementation that is 
 chosen
 ---

 Key: LUCENE-4062
 URL: https://issues.apache.org/jira/browse/LUCENE-4062
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
  Labels: performance
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, 
 PackedIntsBenchmark.java, PackedIntsBenchmark.java, 
 measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, 
 measurements_te_xeon.txt


 In order to save space, Lucene has two main PackedInts.Mutable implentations, 
 one that is very fast and is based on a byte/short/integer/long array 
 (Direct*) and another one which packs bits in a memory-efficient manner 
 (Packed*).
 The packed implementation tends to be much slower than the direct one, which 
 discourages some Lucene components to use it. On the other hand, if you store 
 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
 If you accept to trade some space for speed, you could store 3 of these 21 
 bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
 advantage of this approach is that you never need to read more than one block 
 to read or write a value, so this can be significantly faster than Packed32 
 and Packed64 which always need to read/write two blocks in order to avoid 
 costly branches.
 I ran some tests, and for 1000 21 bits values, this implementation takes 
 less than 2% more space and has 44% faster writes and 30% faster reads. The 
 12 bits version (5 values per block) has the same performance improvement and 
 a 6% memory overhead compared to the packed implementation.
 In order to select the best implementation for a given integer size, I wrote 
 the {{PackedInts.getMutable(valueCount, bitsPerValue, 
 acceptableOverheadPerValue)}} method. This method select the fastest 
 implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
 per value. For example, if you accept an overhead of 20% 
 ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
 reasonable, here is what implementations would be selected:
  * 1: Packed64SingleBlock1
  * 2: Packed64SingleBlock2
  * 3: Packed64SingleBlock3
  * 4: Packed64SingleBlock4
  * 5: Packed64SingleBlock5
  * 6: Packed64SingleBlock6
  * 7: Direct8
  * 8: Direct8
  * 9: Packed64SingleBlock9
  * 10: Packed64SingleBlock10
  * 11: Packed64SingleBlock12
  * 12: Packed64SingleBlock12
  * 13: Packed64
  * 14: Direct16
  * 15: Direct16
  * 16: Direct16
  * 17: Packed64
  * 18: Packed64SingleBlock21
  * 19: Packed64SingleBlock21
  * 20: Packed64SingleBlock21
  * 21: Packed64SingleBlock21
  * 22: Packed64
  * 23: Packed64
  * 24: Packed64
  * 25: Packed64
  * 26: Packed64
  * 27: Direct32
  * 28: Direct32
  * 29: Direct32
  * 30: Direct32
  * 31: Direct32
  * 32: Direct32
  

Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Erik Hatcher
Yeah, I've got a bunch of outstanding ideas/JIRA's myself, but none of which 
affect the index format or anything low-level.

I'm assuming that we'll allow a whole bunch of non-index-breaking things to 
occur after alpha is released.  

So here's my +1: I've been using trunk for quite a while now in a number of 
scenarios.

Erik



On Jun 27, 2012, at 08:53 , Erick Erickson wrote:

 But what about feature all my favorites here. Which means I haven't
 gotten off my butt and moved them forward. Waiting for the elves isn't
 working G
 
 +1, go for it. We can't continue waiting for the elves, they unionized
 and are demanding some time off for sleep
 
 Erick
 
 On Wed, Jun 27, 2012 at 6:40 AM, Antoine LE FLOC'H lefl...@gmail.com wrote:
 
 Using SolrJ from the alpha, everything works. Go for it !
 
 
 
 On Wed, Jun 27, 2012 at 12:03 PM, Antoine LE FLOC'H lefl...@gmail.com
 wrote:
 
 I was actually using a solrconfig.xml that is too old for this version.
 
 catalina.out  gave me some erros on indexDefaults and mainIndex, so I
 took the solrconfig.xml from your alpha package and it worked fine.
 
 I haven't been able to totally check everything yet because I am using a
 Solrj 3.6 indexing client and I had some content type issues in
 catalina.out. I am working on it.
 
 
 
 On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis
 matheis.ste...@googlemail.com wrote:
 
 On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote:
 that seems worth an issue. is there one already?
 
 not yet, there was one comment on SOLR-3238 but no further comment.
 
 
 On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote:
 Are you aware of this error ? Thanks again.
 
 
 
 Antoine, would you mind to open one and provide some infos? This error
 will show up if there's a problem accessing /admin/system, i guess that
 should be our starting point
 
 
 
 
 Stefan
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Mark Miller

On Jun 27, 2012, at 9:12 AM, Erik Hatcher wrote:

 I'm assuming that we'll allow a whole bunch of non-index-breaking things to 
 occur after alpha is released. 

Yup - I certainly have a few things to do for a Solr 4 still.

- Mark Miller
lucidimagination.com












-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Li Li
what new features are added to 4.0 alpha? which is not finished for 4.0
final release?
在 2012-6-26 凌晨5:28,Robert Muir rcm...@gmail.com写道:

 artifacts are here:


 http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0aRC1-rev1353699/

 Here is my +1

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Created] (LUCENE-4169) Mark Spatial module classes as experimental

2012-06-27 Thread Chris Male (JIRA)
Chris Male created LUCENE-4169:
--

 Summary: Mark Spatial module classes as experimental
 Key: LUCENE-4169
 URL: https://issues.apache.org/jira/browse/LUCENE-4169
 Project: Lucene - Java
  Issue Type: Task
  Components: modules/spatial
Reporter: Chris Male


The more I dive into this code the more I worry about it, so I think we should 
give ourselves some leeway to make API changes as part of improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1770) move default example core config/data into a collection1 folder

2012-06-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402217#comment-13402217
 ] 

Mark Miller commented on SOLR-1770:
---

I just did my best to backport this to the 4 branch.

 move default example core config/data into a collection1 folder
 ---

 Key: SOLR-1770
 URL: https://issues.apache.org/jira/browse/SOLR-1770
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.0, 5.0

 Attachments: SOLR-1770.patch


 This is a better starting point for adding more cores - perhaps we can also 
 get rid of multi-core example

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: 4.0 alpha (take two)

2012-06-27 Thread Robert Muir
On Wed, Jun 27, 2012 at 9:14 AM, Mark Miller markrmil...@gmail.com wrote:

 On Jun 27, 2012, at 9:12 AM, Erik Hatcher wrote:

 I'm assuming that we'll allow a whole bunch of non-index-breaking things to 
 occur after alpha is released.

 Yup - I certainly have a few things to do for a Solr 4 still.


This is absolutely the intent here: supporting the lucene index format
like a real release might be enough for many folks that would
otherwise be scared of trunk to try this out.

So we should keep adding features and breaking apis without fear.

We should even continue making improvements to the index format (but
in a backwards compatible way)

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-4.x - Build # 157 - Failure

2012-06-27 Thread Robert Muir
Hmm, I suspect this is a bug in the position length implementation of
CommonGramsFilter.

This filter inserts additional tokens (bigrams) around stopwords, so
if you have this is a test it will create this this_is is is_a a
a_test and so on, so it can be viewed as a conditional
shinglefilter.

But it hardcodes the length as posLenAttribute.setPositionLength(2); // bigram

If the input is already a graph (posLen != 1), then this will be
incorrect. How does ShingleFilter handle this situation? Would be nice
if we can fix this without capturing state or slowing it down

On Sat, Jun 23, 2012 at 7:47 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/157/

 1 tests failed.
 REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains

 Error Message:
 last stage: inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i 
 i u f i i u f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i 
 i u f d s s j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d 
 s s i u f d s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f 
 u f d u f d s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j 
 g n 1 f f d f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1

 Stack Trace:
 java.lang.IllegalStateException: last stage: inconsistent endOffset at 
 pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u f d i i u f d s i i u f d 
 s s i i u f d s s j i i u f d s s j g i i u f d s s j g n i i u f d s s j g n 
 1 i i u i u f i u f d i u f d s i u f d s s i u f d s s j i u f d s s j g i u 
 f d s s j g n i u f d s s j g n 1 u u f u f d u f d s u f d s s u f d s s j u 
 f d s s j g u f d s s j g n u f d s s j g n 1 f f d f d s f d s s f d s s j f 
 d s s j g f d s s j g n f d s s j g n 1
        at 
 __randomizedtesting.SeedInfo.seed([12635ABB4F789F2A:2F8273DA086A82EA]:0)
        at 
 org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135)
        at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:644)
        at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:554)
        at 
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:450)
        at 
 org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:860)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
        at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
        at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
        at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
 org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
        at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
        at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
        at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at 
 org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
        at 
 

[jira] [Created] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams

2012-06-27 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4170:
---

 Summary: TestRandomChains fail with Shingle+CommonGrams
 Key: LUCENE-4170
 URL: https://issues.apache.org/jira/browse/LUCENE-4170
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir


ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
-Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
-Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1

This test has two shinglefilters, then a common-grams filter. I think posLen 
impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams

2012-06-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4170:


Attachment: LUCENE-4170.patch

first stab a patch for commongrams' posLen. But, the test still fails. So 
either my patch is wrong, or we need to fix shingle, too.

We could use some standalone tests here as well.

 TestRandomChains fail with Shingle+CommonGrams
 --

 Key: LUCENE-4170
 URL: https://issues.apache.org/jira/browse/LUCENE-4170
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4170.patch


 ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
 -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
 -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1
 This test has two shinglefilters, then a common-grams filter. I think posLen 
 impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams

2012-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402256#comment-13402256
 ] 

Robert Muir commented on LUCENE-4170:
-

I think shingles has a similar bug: it doesn't look at the existing posLength 
of the input tokens at all, instead it just fills posLength with the 
builtGramSize.

 TestRandomChains fail with Shingle+CommonGrams
 --

 Key: LUCENE-4170
 URL: https://issues.apache.org/jira/browse/LUCENE-4170
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4170.patch


 ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
 -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
 -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1
 This test has two shinglefilters, then a common-grams filter. I think posLen 
 impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402274#comment-13402274
 ] 

David Smiley commented on LUCENE-4167:
--

I agree with your complaint.  The only two supported operations are:
* Intersects -- equivalent to IsWithin when index data is points
* BBoxWIntersects -- again, equivalent to BBoxIsWithin when the indexed data is 
points.

The distinction of overlaps with intersects seems dubious.

The bbox handling is universally handled in SpatialArgs.getShape() which checks 
the operation and returns the wrapping rectangle.  So effectively the 
strategies need not even bother with the whole SpatialOperation concept, at 
least not at the moment.

My concern with your suggestion to remove SpatialOperation is that I do think 
it will return.  I know I want to work on an IsWithin when indexed data is 
shapes with area.  And it is serving the purpose of SpatialArgsParser parsing 
out the operation you want to do, which I don't think should go away (i.e. the 
query string shouldn't assume an intersect, it should include Intersects(...) 
 Perhaps the unsupported operations could be commented out?

Separately, I think com.spatial4j.core.query.* belongs in Lucene spatial.  It's 
not used by any of the rest of Spatial4j, yet it's tightly related to the 
concept of querying which is Lucene spatial's business, and is not the business 
of Spatial4j.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male

 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-27 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402279#comment-13402279
 ] 

Toke Eskildsen commented on LUCENE-4062:


Making Packed64calc the new Packed64 seems like a safe bet. I'd be happy to 
create a patch for it. Should I open a new issue or add the patch here? If I do 
it here, how do we avoid confusing the original fine-grained-oriented patch 
from the Packed64 replacement?

I think it it hard to see a clear pattern as to which Mutable implementation 
should be selected for the different size  bpv-requirements, with the current 
available measurements. I'll perform some more experiments with JRE1.6/JRE1.7 
on different hardware and see if the picture gets clearer.

 More fine-grained control over the packed integer implementation that is 
 chosen
 ---

 Key: LUCENE-4062
 URL: https://issues.apache.org/jira/browse/LUCENE-4062
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
  Labels: performance
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, 
 PackedIntsBenchmark.java, PackedIntsBenchmark.java, 
 measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, 
 measurements_te_xeon.txt


 In order to save space, Lucene has two main PackedInts.Mutable implentations, 
 one that is very fast and is based on a byte/short/integer/long array 
 (Direct*) and another one which packs bits in a memory-efficient manner 
 (Packed*).
 The packed implementation tends to be much slower than the direct one, which 
 discourages some Lucene components to use it. On the other hand, if you store 
 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
 If you accept to trade some space for speed, you could store 3 of these 21 
 bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
 advantage of this approach is that you never need to read more than one block 
 to read or write a value, so this can be significantly faster than Packed32 
 and Packed64 which always need to read/write two blocks in order to avoid 
 costly branches.
 I ran some tests, and for 1000 21 bits values, this implementation takes 
 less than 2% more space and has 44% faster writes and 30% faster reads. The 
 12 bits version (5 values per block) has the same performance improvement and 
 a 6% memory overhead compared to the packed implementation.
 In order to select the best implementation for a given integer size, I wrote 
 the {{PackedInts.getMutable(valueCount, bitsPerValue, 
 acceptableOverheadPerValue)}} method. This method select the fastest 
 implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
 per value. For example, if you accept an overhead of 20% 
 ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
 reasonable, here is what implementations would be selected:
  * 1: Packed64SingleBlock1
  * 2: Packed64SingleBlock2
  * 3: Packed64SingleBlock3
  * 4: Packed64SingleBlock4
  * 5: Packed64SingleBlock5
  * 6: Packed64SingleBlock6
  * 7: Direct8
  * 8: Direct8
  * 9: Packed64SingleBlock9
  * 10: Packed64SingleBlock10
  * 11: Packed64SingleBlock12
  * 12: Packed64SingleBlock12
  * 13: Packed64
  * 14: Direct16
  * 15: Direct16
  * 16: Direct16
  * 17: Packed64
  * 18: Packed64SingleBlock21
  * 19: Packed64SingleBlock21
  * 20: Packed64SingleBlock21
  * 21: Packed64SingleBlock21
  * 22: Packed64
  * 23: Packed64
  * 24: Packed64
  * 25: Packed64
  * 26: Packed64
  * 27: Direct32
  * 28: Direct32
  * 29: Direct32
  * 30: Direct32
  * 31: Direct32
  * 32: Direct32
  * 33: Packed64
  * 34: Packed64
  * 35: Packed64
  * 36: Packed64
  * 37: Packed64
  * 38: Packed64
  * 39: Packed64
  * 40: Packed64
  * 41: Packed64
  * 42: Packed64
  * 43: Packed64
  * 44: Packed64
  * 45: Packed64
  * 46: Packed64
  * 47: Packed64
  * 48: Packed64
  * 49: Packed64
  * 50: Packed64
  * 51: Packed64
  * 52: Packed64
  * 53: Packed64
  * 54: Direct64
  * 55: Direct64
  * 56: Direct64
  * 57: Direct64
  * 58: Direct64
  * 59: Direct64
  * 60: Direct64
  * 61: Direct64
  * 62: Direct64
 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
 choose the slower Packed64 implementation. Allowing a 50% overhead would 
 prevent the packed implementation to be selected for bits per value under 32. 
 Allowing an overhead of 32 bits per value would make sure that a Direct* 
 implementation is always selected.
 Next steps would be to:
  * make lucene 

[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen

2012-06-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402283#comment-13402283
 ] 

Adrien Grand commented on LUCENE-4062:
--

Yes, a new issue will make things clearer. Thanks, Toke!

 More fine-grained control over the packed integer implementation that is 
 chosen
 ---

 Key: LUCENE-4062
 URL: https://issues.apache.org/jira/browse/LUCENE-4062
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/other
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
  Labels: performance
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, 
 LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, 
 PackedIntsBenchmark.java, PackedIntsBenchmark.java, 
 measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, 
 measurements_te_xeon.txt


 In order to save space, Lucene has two main PackedInts.Mutable implentations, 
 one that is very fast and is based on a byte/short/integer/long array 
 (Direct*) and another one which packs bits in a memory-efficient manner 
 (Packed*).
 The packed implementation tends to be much slower than the direct one, which 
 discourages some Lucene components to use it. On the other hand, if you store 
 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%.
 If you accept to trade some space for speed, you could store 3 of these 21 
 bits integers in a long, resulting in an overhead of 1/3 bit per value. One 
 advantage of this approach is that you never need to read more than one block 
 to read or write a value, so this can be significantly faster than Packed32 
 and Packed64 which always need to read/write two blocks in order to avoid 
 costly branches.
 I ran some tests, and for 1000 21 bits values, this implementation takes 
 less than 2% more space and has 44% faster writes and 30% faster reads. The 
 12 bits version (5 values per block) has the same performance improvement and 
 a 6% memory overhead compared to the packed implementation.
 In order to select the best implementation for a given integer size, I wrote 
 the {{PackedInts.getMutable(valueCount, bitsPerValue, 
 acceptableOverheadPerValue)}} method. This method select the fastest 
 implementation that has less than {{acceptableOverheadPerValue}} wasted bits 
 per value. For example, if you accept an overhead of 20% 
 ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty 
 reasonable, here is what implementations would be selected:
  * 1: Packed64SingleBlock1
  * 2: Packed64SingleBlock2
  * 3: Packed64SingleBlock3
  * 4: Packed64SingleBlock4
  * 5: Packed64SingleBlock5
  * 6: Packed64SingleBlock6
  * 7: Direct8
  * 8: Direct8
  * 9: Packed64SingleBlock9
  * 10: Packed64SingleBlock10
  * 11: Packed64SingleBlock12
  * 12: Packed64SingleBlock12
  * 13: Packed64
  * 14: Direct16
  * 15: Direct16
  * 16: Direct16
  * 17: Packed64
  * 18: Packed64SingleBlock21
  * 19: Packed64SingleBlock21
  * 20: Packed64SingleBlock21
  * 21: Packed64SingleBlock21
  * 22: Packed64
  * 23: Packed64
  * 24: Packed64
  * 25: Packed64
  * 26: Packed64
  * 27: Direct32
  * 28: Direct32
  * 29: Direct32
  * 30: Direct32
  * 31: Direct32
  * 32: Direct32
  * 33: Packed64
  * 34: Packed64
  * 35: Packed64
  * 36: Packed64
  * 37: Packed64
  * 38: Packed64
  * 39: Packed64
  * 40: Packed64
  * 41: Packed64
  * 42: Packed64
  * 43: Packed64
  * 44: Packed64
  * 45: Packed64
  * 46: Packed64
  * 47: Packed64
  * 48: Packed64
  * 49: Packed64
  * 50: Packed64
  * 51: Packed64
  * 52: Packed64
  * 53: Packed64
  * 54: Direct64
  * 55: Direct64
  * 56: Direct64
  * 57: Direct64
  * 58: Direct64
  * 59: Direct64
  * 60: Direct64
  * 61: Direct64
  * 62: Direct64
 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still 
 choose the slower Packed64 implementation. Allowing a 50% overhead would 
 prevent the packed implementation to be selected for bits per value under 32. 
 Allowing an overhead of 32 bits per value would make sure that a Direct* 
 implementation is always selected.
 Next steps would be to:
  * make lucene components use this {{getMutable}} method and let users decide 
 what trade-off better suits them,
  * write a Packed32SingleBlock implementation if necessary (I didn't do it 
 because I have no 32-bits computer to test the performance improvements).
 I think this would allow more fine-grained control over the speed/space 
 trade-off, what do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402286#comment-13402286
 ] 

Chris Male commented on LUCENE-4167:


{quote}
Intersects – equivalent to IsWithin when index data is points
BBoxWIntersects – again, equivalent to BBoxIsWithin when the indexed data is 
points.
{quote}

I don't see the need to differentiate BBoxIntersects and Intersects.  If the 
user wants to find those Documents related to the bounding box of a Shape, then 
they can call shape.getBoundingBox() and pass that into the Strategy.  The 
Strategys shouldn't have to worry about the Shape (although TwoDoubles does but 
that needs to be re-thought separately).  The Strategys should just take the 
Shape given and roll with it.  Is that what you're suggesting?

{quote}
My concern with your suggestion to remove SpatialOperation is that I do think 
it will return. I know I want to work on an IsWithin when indexed data is 
shapes with area. And it is serving the purpose of SpatialArgsParser parsing 
out the operation you want to do, which I don't think should go away (i.e. the 
query string shouldn't assume an intersect, it should include Intersects(...) 
Perhaps the unsupported operations could be commented out?
{quote}

I can see the need for different behaviour for different Shape relationships 
to.  But I think we should perhaps do that using method specialization.  We 
already have the PrefixTreeStrategy abstraction, so you could write a 
WithinRecursivePrefixTreeStrategy which specialized makeQuery differently.  
That way it is clear to the user what the Strategy does, we won't need the 
runtime checks and we won't have Strategys like TwoDoubles which has methods 
for each of the different behaviours in the same class.

So I think we can remove the need for SpatialOperation now and support the idea 
differently in the future.

(As a side note, this actually makes me think we should decouple the indexing 
code of Strategys from the querying code).

{quote}
Separately, I think com.spatial4j.core.query.* belongs in Lucene spatial. It's 
not used by any of the rest of Spatial4j, yet it's tightly related to the 
concept of querying which is Lucene spatial's business, and is not the business 
of Spatial4j.
{quote}

+1.  As a short term solution I think we just replicate the code that we need 
in Lucene now and then drop it from Spatial4J in the next release.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male

 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters

2012-06-27 Thread Michael Dodsworth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402287#comment-13402287
 ] 

Michael Dodsworth commented on SOLR-3467:
-

Thank you, Jan.

From what I can tell, '/' only became a reserved character since 4.0 - 
https://issues.apache.org/jira/browse/LUCENE-2604.

 ExtendedDismax escaping is missing several reserved characters
 --

 Key: SOLR-3467
 URL: https://issues.apache.org/jira/browse/SOLR-3467
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Michael Dodsworth
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 4.0, 3.6.1, 5.0

 Attachments: SOLR-3467-lucene_solr_3_6.patch, SOLR-3467.patch, 
 SOLR-3467.patch, SOLR-3467.patch


 When edismax is unable to parse the original user query, it retries using an 
 escaped version of that query (where all reserved chars have been escaped).
 Currently, the escaping done in {{splitIntoClauses}} appears to be missing 
 several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', 
 '', '/'}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled

2012-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402303#comment-13402303
 ] 

Yonik Seeley commented on SOLR-3580:


This is by design.  Treating and and or as operators when people may not 
realize they are is much less catastrophic than treating not as an operator.

If someone searches for
to be or not to be
excluding all documents with to in them is very bad.



 In ExtendedDismax, lowercase 'not' operator is not being treated as an 
 operator when 'lowercaseOperators' is enabled
 

 Key: SOLR-3580
 URL: https://issues.apache.org/jira/browse/SOLR-3580
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3580.patch


 When lowercase operator support is enabled (for edismax), the lowercase 'not' 
 operator is being wrongly treated as a literal term (and not as an operator).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams

2012-06-27 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402307#comment-13402307
 ] 

Steven Rowe commented on LUCENE-4170:
-

bq. I think shingles has a similar bug: it doesn't look at the existing 
posLength of the input tokens at all, instead it just fills posLength with the 
builtGramSize.

I agree.

However, the problem isn't just position length: ShingleFilter has never 
handled input position increments of zero, so real graph compatibility will 
mean fixing that too.

I think Karl Wettin's ShingleMatrixFilter (deprecated in 3.6, dropped in 4.0) 
is an attempt to permute all combinations of overlapping (poslength=1) terms to 
produce shingles.  ShingleMatrixFilter wouldn't handle poslength  1, though.

I'm not even sure what token ngramming should mean over an input graph.  The 
trivial case where input tokens' poslength is always zero and position 
increment is always one is obviously already handled.

I think both issues should be handled, since poslength  1 will very likely be 
used with posincr = 0, e.g. synonyms and kuromoji de-compounding.


 TestRandomChains fail with Shingle+CommonGrams
 --

 Key: LUCENE-4170
 URL: https://issues.apache.org/jira/browse/LUCENE-4170
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4170.patch


 ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
 -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
 -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1
 This test has two shinglefilters, then a common-grams filter. I think posLen 
 impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams

2012-06-27 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402314#comment-13402314
 ] 

Steven Rowe commented on LUCENE-4170:
-

bq. I'm not even sure what token ngramming should mean over an input graph.

A thought problem: run ShingleFilter with mingramsize=2, maxgramsize=3, 
outputUnigrams=true over input {{\[a/1] \[b/1] \[c/1] \[d/1]}} (where {{/n}} 
indicates poslength = {{n}}, and {{\[a b]}} indicates tokens {{a}} and {{b}} 
are at the same position; I'll omit the {{\[]}}'s below when only one token is 
at a given position), then run ShingleFilter again with the same config over 
the first ShingleFilter's output:

{noformat}
shinglefilter(min:2,max:3,unigrams:true) with input:  a/1  b/1  c/1  d/1 

_ token sep: [a/1  a_b/2  a_b_c/3]  [b/1  b_c/2  b_c_d/3]  [c/1  c_d/2]  d/1

shinglefilter(2,3,unigrams) with shinglefilter output above as input:

= token sep: [a/1  a_b/2  a_b_c/3  a=b/2  a=b_c/3  a=b_c_d/4  a=b=c/3  
a=b=c_d/4  a=b_c=d/4  a_b=c/3  a_b=c_d/4  a_b=c=d/4  a_b_c=d/4]  
   [b/1  b_c/2  b_c_d/3  b=c/2  b=c_d/3  b_c=d/3]
   [c/1  c_d/2  c=d/2]
   d/1
{noformat}


 TestRandomChains fail with Shingle+CommonGrams
 --

 Key: LUCENE-4170
 URL: https://issues.apache.org/jira/browse/LUCENE-4170
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4170.patch


 ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
 -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
 -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1
 This test has two shinglefilters, then a common-grams filter. I think posLen 
 impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled

2012-06-27 Thread Michael Dodsworth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402318#comment-13402318
 ] 

Michael Dodsworth commented on SOLR-3580:
-

surely that's a more general hazard with supporting lowercase operators. It 
seems strange to give 'not' special treatment. There are likely are examples 
where having 'and' or 'or' wrongly treated as a operator /is/ catastrophic, 
therefore the onus should be on the client to choose the correct 
'lowercaseOperator' option for their use-case.


 In ExtendedDismax, lowercase 'not' operator is not being treated as an 
 operator when 'lowercaseOperators' is enabled
 

 Key: SOLR-3580
 URL: https://issues.apache.org/jira/browse/SOLR-3580
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3580.patch


 When lowercase operator support is enabled (for edismax), the lowercase 'not' 
 operator is being wrongly treated as a literal term (and not as an operator).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled

2012-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402321#comment-13402321
 ] 

Yonik Seeley commented on SOLR-3580:


edismax is about heuristics and sometimes guessing user intent... if 
exact/strict syntax is desired, the lucene query parser is a better fit.

 In ExtendedDismax, lowercase 'not' operator is not being treated as an 
 operator when 'lowercaseOperators' is enabled
 

 Key: SOLR-3580
 URL: https://issues.apache.org/jira/browse/SOLR-3580
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3580.patch


 When lowercase operator support is enabled (for edismax), the lowercase 'not' 
 operator is being wrongly treated as a literal term (and not as an operator).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs

2012-06-27 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4080:
-

Attachment: LUCENE-4080.patch

Patch. The {{SegmentReader}} returned by {{getMergeReader}} now has a correct 
{{numDeletedDocuments()}} and {{getLiveDocs()}}. Could someone familiar with 
Lucene merging internals review this patch?



 SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
 --

 Key: LUCENE-4080
 URL: https://issues.apache.org/jira/browse/LUCENE-4080
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1
Reporter: Adrien Grand
Priority: Trivial
 Fix For: 4.1

 Attachments: LUCENE-4080.patch


 At merge time, SegmentReader sometimes gives an incorrect value for 
 numDeletedDocs.
 From LUCENE-2357:
 bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable 
 in this context (SegmentReader passed to SegmentMerger for merging); this is 
 because we allow newly marked deleted docs to happen concurrently up until 
 the moment we need to pass the SR instance to the merger (search for // Must 
 sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would 
 be nice to fix that, so I think open a new issue (it won't block this one)? 
 We should be able to make a new SR instance, sharing the same core as the 
 current one but using the correct delCount...
 bq. It would be cleaner (but I think hairier) to create a new SR for merging 
 that holds the correct delCount, but let's do that under the separate issue.
 bq.  it would be best if the SegmentReader's numDeletedDocs were always 
 correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix 
 could be hairy but the end result (SegmentReader.numDeletedDocs can always 
 be trusted) would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled

2012-06-27 Thread Michael Dodsworth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402350#comment-13402350
 ] 

Michael Dodsworth commented on SOLR-3580:
-

were we not allowing the user to explicitly *specify* that they want to support 
lowercase operators, I might agree.

That setting should (at the very least) come with a clear health warning so 
that more people aren't caught out by this.

 In ExtendedDismax, lowercase 'not' operator is not being treated as an 
 operator when 'lowercaseOperators' is enabled
 

 Key: SOLR-3580
 URL: https://issues.apache.org/jira/browse/SOLR-3580
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3580.patch


 When lowercase operator support is enabled (for edismax), the lowercase 'not' 
 operator is being wrongly treated as a literal term (and not as an operator).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs

2012-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402353#comment-13402353
 ] 

Robert Muir commented on LUCENE-4080:
-

I think its cleaner not to have the 'if numDocs = 0' in SegmentReader ctor#2
Instead i think ctor #1 should just forward docCount - delCount like ctor#3

 SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
 --

 Key: LUCENE-4080
 URL: https://issues.apache.org/jira/browse/LUCENE-4080
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1
Reporter: Adrien Grand
Priority: Trivial
 Fix For: 4.1

 Attachments: LUCENE-4080.patch


 At merge time, SegmentReader sometimes gives an incorrect value for 
 numDeletedDocs.
 From LUCENE-2357:
 bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable 
 in this context (SegmentReader passed to SegmentMerger for merging); this is 
 because we allow newly marked deleted docs to happen concurrently up until 
 the moment we need to pass the SR instance to the merger (search for // Must 
 sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would 
 be nice to fix that, so I think open a new issue (it won't block this one)? 
 We should be able to make a new SR instance, sharing the same core as the 
 current one but using the correct delCount...
 bq. It would be cleaner (but I think hairier) to create a new SR for merging 
 that holds the correct delCount, but let's do that under the separate issue.
 bq.  it would be best if the SegmentReader's numDeletedDocs were always 
 correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix 
 could be hairy but the end result (SegmentReader.numDeletedDocs can always 
 be trusted) would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions

2012-06-27 Thread Despot Jakimovski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Despot Jakimovski updated SOLR-3574:


Fix Version/s: (was: 3.6)
   5.0
   4.1
   4.0
Affects Version/s: (was: 3.6)
   5.0
   4.1
   4.0

 Create a Compound Word Filter (and Factory) extension that will allow support 
 for (word) exceptions
 ---

 Key: SOLR-3574
 URL: https://issues.apache.org/jira/browse/SOLR-3574
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 4.0, 4.1, 5.0
Reporter: Despot Jakimovski
  Labels: compound-word, dictionary, feature, filter, 
 word-exception
 Fix For: 4.0, 4.1, 5.0

   Original Estimate: 72h
  Remaining Estimate: 72h

 When having the following use case:
 We have 2 words penslot and knoppen. One of them presents a compound word 
 (penslot), the other one is a plural form of knop.
 When using the compound word filter, if we place the words pen slot and 
 knop in the dictionary, for a search containing knoppen, we get results 
 containing pen also, which shouldn't be the case, because knoppen is only 
 a plural form (not a compound word). 
 We need another dictionary to specify the words that are exceptions to the 
 filter (like in this case knoppen). In this case, the filter would find 
 compound words containing pen slot and knop, but will leave out 
 dividing knoppen and searching on its parts.
 More info on the subject: 
 http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs

2012-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402365#comment-13402365
 ] 

Robert Muir commented on LUCENE-4080:
-

Also is it ok in mergeMiddle that we call rld.getMergeReader inside the sync?

Previously, we never did actual i/o here...

 SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
 --

 Key: LUCENE-4080
 URL: https://issues.apache.org/jira/browse/LUCENE-4080
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1
Reporter: Adrien Grand
Priority: Trivial
 Fix For: 4.1

 Attachments: LUCENE-4080.patch


 At merge time, SegmentReader sometimes gives an incorrect value for 
 numDeletedDocs.
 From LUCENE-2357:
 bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable 
 in this context (SegmentReader passed to SegmentMerger for merging); this is 
 because we allow newly marked deleted docs to happen concurrently up until 
 the moment we need to pass the SR instance to the merger (search for // Must 
 sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would 
 be nice to fix that, so I think open a new issue (it won't block this one)? 
 We should be able to make a new SR instance, sharing the same core as the 
 current one but using the correct delCount...
 bq. It would be cleaner (but I think hairier) to create a new SR for merging 
 that holds the correct delCount, but let's do that under the separate issue.
 bq.  it would be best if the SegmentReader's numDeletedDocs were always 
 correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix 
 could be hairy but the end result (SegmentReader.numDeletedDocs can always 
 be trusted) would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



JIRA assignee options

2012-06-27 Thread Despot Jakimovski
Hi,

I would like to assign myself to the JIRA
taskhttps://issues.apache.org/jira/browse/SOLR-3574,
but I cannot find the Assign to button. Probably I am missing some
privileges. Can someone help me fix this?

Cheers,
despot


[jira] [Created] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

2012-06-27 Thread Mark Miller (JIRA)
Mark Miller created SOLR-3582:
-

 Summary: Leader election zookeeper watcher is responding to 
con/discon notifications incorrectly.
 Key: SOLR-3582
 URL: https://issues.apache.org/jira/browse/SOLR-3582
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0


As brought up by Trym R. Møller on the mailing list, we are responding to 
watcher events about connection/disconnection as if they were notifications 
about node changes.

http://www.lucidimagination.com/search/document/e13ef390b882

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

2012-06-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402388#comment-13402388
 ] 

Mark Miller commented on SOLR-3582:
---

I'm unsure of the proposed solution on the mailing list.

On a connection event, the watch will fire - we will skip doing anything, but 
watches are one time events, so we will have no watch in place?

 Leader election zookeeper watcher is responding to con/discon notifications 
 incorrectly.
 

 Key: SOLR-3582
 URL: https://issues.apache.org/jira/browse/SOLR-3582
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0


 As brought up by Trym R. Møller on the mailing list, we are responding to 
 watcher events about connection/disconnection as if they were notifications 
 about node changes.
 http://www.lucidimagination.com/search/document/e13ef390b882

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3573) Data import does not free CLOB

2012-06-27 Thread Bjorn Hijmans (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402393#comment-13402393
 ] 

Bjorn Hijmans commented on SOLR-3573:
-

Some more information, for me this started to happen after we started storing 
XMLTYPE data as binary instead of CLOB. I managed to fix it by casting the 
java.sql.Clob to a oracle.sql.CLOB so I could use freeTemporary() to free the 
clob. Not an acceptable solution to commit though. Not sure if this is a solr 
problem, a JDBC problem or an oracle problem.

 Data import does not free CLOB
 --

 Key: SOLR-3573
 URL: https://issues.apache.org/jira/browse/SOLR-3573
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
 Environment: Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode, 
 sharing), oracle 11.2.0.3.0, Solr-trunk
Reporter: Bjorn Hijmans
 Attachments: oracle_clob_freetemporary.diff


 When selecting a CLOB in the deltaImportQuery, the CLOB will not be freed 
 which will cause the Oracle process to use up all memory on the Oracle server.
 I'm not very good at java, but I think changes need to be made in 
 FieldReaderDataSource.java. In the getData method, the characterStream from 
 the Clob needs to be copied to a new stream, so the clob can be freed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3573) Data import does not free CLOB

2012-06-27 Thread Bjorn Hijmans (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjorn Hijmans updated SOLR-3573:


Attachment: oracle_clob_freetemporary.diff

 Data import does not free CLOB
 --

 Key: SOLR-3573
 URL: https://issues.apache.org/jira/browse/SOLR-3573
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
 Environment: Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode, 
 sharing), oracle 11.2.0.3.0, Solr-trunk
Reporter: Bjorn Hijmans
 Attachments: oracle_clob_freetemporary.diff


 When selecting a CLOB in the deltaImportQuery, the CLOB will not be freed 
 which will cause the Oracle process to use up all memory on the Oracle server.
 I'm not very good at java, but I think changes need to be made in 
 FieldReaderDataSource.java. In the getData method, the characterStream from 
 the Clob needs to be copied to a new stream, so the clob can be freed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

2012-06-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402398#comment-13402398
 ] 

Mark Miller commented on SOLR-3582:
---

Never mind - found confirmation elsewhere that session events do not remove the 
watcher. The ZooKeeper programming guide does not appear very clear on this 
when it talks about watches being one time triggers.

 Leader election zookeeper watcher is responding to con/discon notifications 
 incorrectly.
 

 Key: SOLR-3582
 URL: https://issues.apache.org/jira/browse/SOLR-3582
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0


 As brought up by Trym R. Møller on the mailing list, we are responding to 
 watcher events about connection/disconnection as if they were notifications 
 about node changes.
 http://www.lucidimagination.com/search/document/e13ef390b882

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux-Java7-64 - Build # 249 - Still Failing!

2012-06-27 Thread Policeman Jenkins Server
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java7-64/249/

1 tests failed.
REGRESSION:  
org.apache.lucene.analysis.ngram.NGramTokenizerTest.testRandomStrings

Error Message:
some thread(s) failed

Stack Trace:
java.lang.RuntimeException: some thread(s) failed
at 
__randomizedtesting.SeedInfo.seed([50E33DEA43DF254D:D86A3D54E0DB7278]:0)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:463)
at 
org.apache.lucene.analysis.ngram.NGramTokenizerTest.testRandomStrings(NGramTokenizerTest.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 4560 lines...]
   [junit4] Suite: org.apache.lucene.analysis.ngram.NGramTokenizerTest
   [junit4] ERROR645s J1 | NGramTokenizerTest.testRandomStrings
   [junit4] Throwable #1: java.lang.RuntimeException: some thread(s) failed
   [junit4]at 
__randomizedtesting.SeedInfo.seed([50E33DEA43DF254D:D86A3D54E0DB7278]:0)
   [junit4]at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:463)
   [junit4]at 
org.apache.lucene.analysis.ngram.NGramTokenizerTest.testRandomStrings(NGramTokenizerTest.java:106)
   [junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
   [junit4]at 

Re: JIRA assignee options

2012-06-27 Thread Erick Erickson
You should be able to do this if you're logged in, there should be a
button along the top titled assign to me.

Sometimes I get logged out when I restart my computer or something

If you're sure you're logged in and still can't assign it to yourself,
let us know

Best
Erick

On Wed, Jun 27, 2012 at 1:19 PM, Despot Jakimovski
despot.jakimov...@gmail.com wrote:
 Hi,

 I would like to assign myself to the JIRA task, but I cannot find the
 Assign to button. Probably I am missing some privileges. Can someone help
 me fix this?

 Cheers,
 despot

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: JIRA assignee options

2012-06-27 Thread Robert Muir
I added you to the 'contributor' role. so you should be able to do
this now: though I dont know if you need to logout and log back in, or
if it takes place immediately

Let me know if you have problems!

On Wed, Jun 27, 2012 at 1:19 PM, Despot Jakimovski
despot.jakimov...@gmail.com wrote:
 Hi,

 I would like to assign myself to the JIRA task, but I cannot find the
 Assign to button. Probably I am missing some privileges. Can someone help
 me fix this?

 Cheers,
 despot



-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 263 - Failure!

2012-06-27 Thread Robert Muir
RateLimiter + Serial Merge Scheduler too it seems?

Toning this thing down seems like a good idea because we also
sometimes use ThrottledIndexOutput: i bet if you are unlucky to get
both its really really slow.

On Mon, Jun 25, 2012 at 8:08 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 I killed that one after 3.5 hrs hanging in Kumoroji tests: 
 http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java6-64/263/console
  - It looks like RateLimiter limited too much What should we do? Tone 
 down the limiter generally or how can we prevent such slowness?

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Policeman Jenkins Server [mailto:jenk...@sd-datasolutions.de]
 Sent: Monday, June 25, 2012 2:05 PM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 263 - Failure!

 Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java6-
 64/263/

 All tests passed

 Build Log:
 [...truncated 3928 lines...]
    [junit4] 2012-06-25 12:05:09
    [junit4] Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.7-b02 
 mixed
 mode):
    [junit4]
    [junit4] Thread-4 prio=10 tid=0x7f6bfc0aa000 nid=0x4749 waiting on
 condition [0x7f6c08498000]
    [junit4]    java.lang.Thread.State: TIMED_WAITING (sleeping)
    [junit4]   at java.lang.Thread.sleep(Native Method)
    [junit4]   at java.lang.Thread.sleep(Thread.java:302)
    [junit4]   at
 org.apache.lucene.store.RateLimiter.pause(RateLimiter.java:83)
    [junit4]   at
 org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutp
 utWrapper.java:82)
    [junit4]   at
 org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:49)
    [junit4]   at
 org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65
 )
    [junit4]   at
 org.apache.lucene.codecs.BlockTermsWriter$TermsWriter.flushBlock(BlockTer
 msWriter.java:294)
    [junit4]   at
 org.apache.lucene.codecs.BlockTermsWriter$TermsWriter.finishTerm(BlockTer
 msWriter.java:212)
    [junit4]   at
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:163)
    [junit4]   at
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:65)
    [junit4]   at
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:3
 24)
    [junit4]   at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3504)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3139)
    [junit4]   at
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.j
 ava:37)
    [junit4]   - locked 0xe0601d98 (a
 org.apache.lucene.index.SerialMergeScheduler)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1703)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1697)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1344)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1084)
    [junit4]   at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit
 er.java:186)
    [junit4]   at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit
 er.java:145)
    [junit4]   at
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(Base
 TokenStreamTestCase.java:562)
    [junit4]   at
 org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenSt
 reamTestCase.java:64)
    [junit4]   at
 org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(Bas
 eTokenStreamTestCase.java:421)
    [junit4]
    [junit4] Thread-3 prio=10 tid=0x7f6bfc0a8800 nid=0x4748 waiting for
 monitor entry [0x7f6c0859a000]
    [junit4]    java.lang.Thread.State: BLOCKED (on object monitor)
    [junit4]   at
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.j
 ava:34)
    [junit4]   - waiting to lock 0xe0601d98 (a
 org.apache.lucene.index.SerialMergeScheduler)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1703)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1697)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1344)
    [junit4]   at
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1084)
    [junit4]   at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit
 er.java:186)
    [junit4]   at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit
 er.java:145)
    [junit4]   at
 org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(Base
 TokenStreamTestCase.java:562)
    [junit4]   at
 org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenSt
 reamTestCase.java:64)
    [junit4]   at
 

[jira] [Updated] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams

2012-06-27 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-4170:


Attachment: recursive.shinglefilter.output.png

This image is a (not pretty) word lattice representation of the output from the 
double ShingleFilter thought problem described above - should help to more 
easily visualize the graph.

(I wish I could make Graphviz line up the dots in a straight line, but couldn't 
figure out how to do that.)

 TestRandomChains fail with Shingle+CommonGrams
 --

 Key: LUCENE-4170
 URL: https://issues.apache.org/jira/browse/LUCENE-4170
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4170.patch, recursive.shinglefilter.output.png


 ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
 -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
 -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1
 This test has two shinglefilters, then a common-grams filter. I think posLen 
 impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Test timing stats.

2012-06-27 Thread Dawid Weiss
Hi. Would somebody who has a physical machine running jenkins add the
following as a post-run step?

ant -f lucene/build.xml test-updatecache -Dtests.cachefile=XXX

where XXX is preferably a path to a file somewhere outside of the
build area (so that it's not cleaned/ removed between builds)? This
will update build times with a history of 20 builds per suite. Once in
a while this file should be copied to:

lucene/tools/junit4/cached-timehints.txt

These are hints for test load balancer if multiple jvms are used (just
a remainder -- the order of suites is still randomized within a single
jvm, and only a fraction of the suites are initially load-balanced,
the rest is delegated to job stealing to level jvm times).

I'm currently running a few builds to update the stats, but for the
future it'd be a nice side effect of jenkins runs.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402531#comment-13402531
 ] 

David Smiley commented on LUCENE-4167:
--

bq. I don't see the need to differentiate BBoxIntersects and Intersects. If the 
user wants to find those Documents related to the bounding box of a Shape, then 
they can call shape.getBoundingBox() and pass that into the Strategy. The 
Strategys shouldn't have to worry about the Shape (although TwoDoubles does but 
that needs to be re-thought separately). The Strategys should just take the 
Shape given and roll with it. Is that what you're suggesting?

The stategy shouldn't care about the bbox concept, I agree. I think the bbox 
capability should be decoupled from SpatialOperation.  It's not a simple matter 
of the client calling queryShape.getBoundingBox() since the expression of the 
query shape from client to server is a string.  So instead of 
BBoxIntersects(Circle(3,5 d=10)) I propose supporting 
INTERSECTS(BBOX(Circle(3,5 d=10))).  The actual set of operations I want to 
support are [E]CQL spatial predicates: 
http://docs.geoserver.org/latest/en/user/filter/ecql_reference.html#spatial-predicate
 but that perhaps deserves its own issue.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male

 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Count of keys of an FST

2012-06-27 Thread Dawid Weiss
I don't think there is one that you could use out of the box... but
maybe I'm wrong and it's stored in the header somewhere (don't have
the source in front of me).

To calculate it by hand the worst case is that you'll need a recursive
traversal, which would mean O(number of stored states) with
intermediate count caches or O(number of keys) without any caches and
memory overhead (just recursive traversal).

Dawid

On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 The FST class has a number of methods that return counts, which one returns
 the total number of keys that have been encoded into the FST?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Count of keys of an FST

2012-06-27 Thread Jason Rutherglen
Sounds like I should just count as the keys are added and store the count
separately.

On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss
dawid.we...@cs.put.poznan.plwrote:

 I don't think there is one that you could use out of the box... but
 maybe I'm wrong and it's stored in the header somewhere (don't have
 the source in front of me).

 To calculate it by hand the worst case is that you'll need a recursive
 traversal, which would mean O(number of stored states) with
 intermediate count caches or O(number of keys) without any caches and
 memory overhead (just recursive traversal).

 Dawid

 On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
  The FST class has a number of methods that return counts, which one
 returns
  the total number of keys that have been encoded into the FST?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Count of keys of an FST

2012-06-27 Thread Dawid Weiss
If you need the count with constant time then yes, you should store it
separately. You could also make a transducer that would store it at
the root node as side-effect of values associated with keys, but it's
kind of ugly.

Please check the fst header though -- I'm not sure, maybe Mike wrote
it so that the node count/ keys count is in there.

Dawid

On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Sounds like I should just count as the keys are added and store the count
 separately.

 On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl
 wrote:

 I don't think there is one that you could use out of the box... but
 maybe I'm wrong and it's stored in the header somewhere (don't have
 the source in front of me).

 To calculate it by hand the worst case is that you'll need a recursive
 traversal, which would mean O(number of stored states) with
 intermediate count caches or O(number of keys) without any caches and
 memory overhead (just recursive traversal).

 Dawid

 On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
  The FST class has a number of methods that return counts, which one
  returns
  the total number of keys that have been encoded into the FST?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4171) Performance improvements to Packed64

2012-06-27 Thread Toke Eskildsen (JIRA)
Toke Eskildsen created LUCENE-4171:
--

 Summary: Performance improvements to Packed64
 Key: LUCENE-4171
 URL: https://issues.apache.org/jira/browse/LUCENE-4171
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/other
Affects Versions: 4.0, 5.0
 Environment: Tested with 4 different Intel machines
Reporter: Toke Eskildsen
Priority: Minor


Based on the performance measurements of PackedInts.Mutable's in LUCENE-4062, a 
new version of Packed64 has been created that is consistently faster than the 
old Packed64 for both get and set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4171) Performance improvements to Packed64

2012-06-27 Thread Toke Eskildsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toke Eskildsen updated LUCENE-4171:
---

Attachment: LUCENE-4171.patch

Finished implementation, ready for review  potential merge. TestPackedInts 
passes.

 Performance improvements to Packed64
 

 Key: LUCENE-4171
 URL: https://issues.apache.org/jira/browse/LUCENE-4171
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/other
Affects Versions: 4.0, 5.0
 Environment: Tested with 4 different Intel machines
Reporter: Toke Eskildsen
Priority: Minor
  Labels: performance
 Attachments: LUCENE-4171.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 Based on the performance measurements of PackedInts.Mutable's in LUCENE-4062, 
 a new version of Packed64 has been created that is consistently faster than 
 the old Packed64 for both get and set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs

2012-06-27 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4080:
-

Attachment: LUCENE-4080.patch

New patch.

Only the {{liveDocs/numDeletedDocs}} copy needs to be protected by the 
{{IndexWriter}} lock. However, the whole method needs to be protected by the 
ReadersAndLiveDocs lock but we can't nest the former into the latter since 
other pieces of code do the opposite (potential deadlock). So I replaced the 
{{ReadersAndLiveDocs}} lock with a {{ReentrantLock}} so that it can overlap 
with the {{IndexWriter}} lock. Does it look better?

 SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
 --

 Key: LUCENE-4080
 URL: https://issues.apache.org/jira/browse/LUCENE-4080
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1
Reporter: Adrien Grand
Priority: Trivial
 Fix For: 4.1

 Attachments: LUCENE-4080.patch, LUCENE-4080.patch


 At merge time, SegmentReader sometimes gives an incorrect value for 
 numDeletedDocs.
 From LUCENE-2357:
 bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable 
 in this context (SegmentReader passed to SegmentMerger for merging); this is 
 because we allow newly marked deleted docs to happen concurrently up until 
 the moment we need to pass the SR instance to the merger (search for // Must 
 sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would 
 be nice to fix that, so I think open a new issue (it won't block this one)? 
 We should be able to make a new SR instance, sharing the same core as the 
 current one but using the correct delCount...
 bq. It would be cleaner (but I think hairier) to create a new SR for merging 
 that holds the correct delCount, but let's do that under the separate issue.
 bq.  it would be best if the SegmentReader's numDeletedDocs were always 
 correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix 
 could be hairy but the end result (SegmentReader.numDeletedDocs can always 
 be trusted) would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402591#comment-13402591
 ] 

David Smiley commented on LUCENE-4167:
--

bq. I can see the need for different behaviour for different Shape 
relationships to. But I think we should perhaps do that using method 
specialization. We already have the PrefixTreeStrategy abstraction, so you 
could write a WithinRecursivePrefixTreeStrategy which specialized makeQuery 
differently. That way it is clear to the user what the Strategy does, we won't 
need the runtime checks and we won't have Strategys like TwoDoubles which has 
methods for each of the different behaviours in the same class.

Sorry, but I disagree with your point of view.  The Strategy is supposed to be 
a single facade to the implementation details of how a query will work, 
including the various possible spatial predicates (i.e. spatial operations) 
that is supports.  If one Java class file shows that it becomes too complicated 
and it would be better separated because implementing different predicates are 
just so fundamentally different, then then the operations could be decomposed 
to separate source files but it would be behind the facade of the Strategy.  I 
don't believe that TwoDoublesStrategy demonstrates complexity of a class trying 
to do too many things.  I absolutely think TwoDoublesStrategy could be coded to 
be more clear.  If it is as buggy/untested as I think it is and nobody wants to 
fix it (I don't), personally I think this strategy can go away.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male

 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4172) clean up redundant throws clauses

2012-06-27 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4172:
---

 Summary: clean up redundant throws clauses
 Key: LUCENE-4172
 URL: https://issues.apache.org/jira/browse/LUCENE-4172
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


examples are things like ctors that list throws XYZException but actually dont, 
and things like 'throws CorruptIndex, LockObtainedFailed, IOException' when all 
of these are actually IOException.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4172) clean up redundant throws clauses

2012-06-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4172:


Attachment: LUCENE-4172.patch

the start to a patch... eclipse doesn't do well here so it would be better to 
use something else to find these.

 clean up redundant throws clauses
 -

 Key: LUCENE-4172
 URL: https://issues.apache.org/jira/browse/LUCENE-4172
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4172.patch


 examples are things like ctors that list throws XYZException but actually 
 dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' 
 when all of these are actually IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4172) clean up redundant throws clauses

2012-06-27 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402669#comment-13402669
 ] 

Steven Rowe commented on LUCENE-4172:
-

IntelliJ has two relevant inspections: Redundant throws clause and Duplicate 
throws.  I've applied your patch to trunk and I'm running these on the whole 
project to see what they find.

 clean up redundant throws clauses
 -

 Key: LUCENE-4172
 URL: https://issues.apache.org/jira/browse/LUCENE-4172
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4172.patch


 examples are things like ctors that list throws XYZException but actually 
 dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' 
 when all of these are actually IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4172) clean up redundant throws clauses

2012-06-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402680#comment-13402680
 ] 

Robert Muir commented on LUCENE-4172:
-

that sounds nice: I think we always want to fix 'duplicate throws'.

But redundant throws requires some decisions... basically i looked at each one 
and:
* nuke the redundant throws if its a static method, private, or 
package-private, or final
* nuke the redundant throws if its a ctor (subclass can always declare its own)
* keep the redundant throws if its public/protected non-final method that can 
be overridden

 clean up redundant throws clauses
 -

 Key: LUCENE-4172
 URL: https://issues.apache.org/jira/browse/LUCENE-4172
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4172.patch


 examples are things like ctors that list throws XYZException but actually 
 dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' 
 when all of these are actually IOException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-06-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402743#comment-13402743
 ] 

Hoss Man commented on SOLR-1725:


bq. My comment above still stands: Another TODO is to get this to work with a 
scripting language implementation JAR file being added as a plugin somehow.

I played around with this on the train today and confirmed that we can do 
runtime loading of jars that included script engines if we changed the 
ScriptEngineManager instantiation so that we use the one arg constructor and 
pass in resourceLoader.getClassLoader().

A few other notes based on reviewing the patch and playing arround with it.  
Baring objections i'll probably take a stab at addressing these tomorow or 
friday...

* i don't see any mechanism for scripts to indicate that processing should 
stop -- ie: the way a java UpdateProcessor would just return w/o calling 
super.foo. we should add/test/doc some functionality to look at the result of 
the invokeFunction call to support this
* the tests seem to assert that the side effects of the scripts happen (ie: 
that the testcase records the function names) but i don't see any assertions 
that the expected modifications of the update commands is happening (ie: that 
documents are being modified in processAdd
* we need to test that request params are easily accessable (i'm not sure how 
well the SolrQueryRequest class works in various scripting langauges, so might 
need to pull out hte SolrParams and expose directly - either way we need to 
demonstrate doing it in a test)
* whitespace/comma/pipesplitting of the script names is a bad meme.  we should 
stop doing that, and require that multiple scripts be specified as multiple 
{{str}} params
** we can add convenience code to support {{arr 
name=scriptstrstr/arr}} style as well
* ScriptFile and it's extension parsing is very primitive and broken on any 
file with . in it's name.  We should just use the helper method for parsing 
filename extensions that already exists in commons-io
* from what i can tell looking at the ScriptEngine javadocs, it's possible that 
a ScriptEngine might exist w/o a specific file extension, or that multiple 
engines could support the same extension(s)  we should offer an init param that 
lets the user specify a ScriptEngine by shortname to override whatever 
extension might be found
* currently, problems with scripts not being found, or engines for scripts not 
being found, aren't reported until first request tries to use them - we should 
error check all of this in init (or inform) and fail fast.
** ditto for the assumption in invokeFunction that we can cast every 
ScriptEngine to Invocable -- we should at check this on init/inform and fail 
fast
* the way the various UpdateProcessor methods are implemented to be lenient 
about any scripts that don't explicitly implement a method seems kludgy -- 
isn't there anyway we can introspect the engine to ask if a function exists?
** in particular, when i did some testing with jruby, i found that it didn't 
work at all - i guess jruby was throwing a ScriptException instead of 
NoSuchMethodException?

{noformat}
undefined method `processCommit' for main:Object (NoMethodError)
org.jruby.embed.InvokeFailedException: (NoMethodError) undefined method 
`processCommit' for main:Object
at 
org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.call(EmbedRubyObjectAdapterImpl.java:403)
at 
org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.callMethod(EmbedRubyObjectAdapterImpl.java:189)
at 
org.jruby.embed.ScriptingContainer.callMethod(ScriptingContainer.java:1386)
at 
org.jruby.embed.jsr223.JRubyEngine.invokeFunction(JRubyEngine.java:262)
at 
org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(ScriptUpdateProcessorFactory.java:221)
at 
org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.processCommit(ScriptUpdateProcessorFactory.java:202)
{noformat}




 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Assignee: Erik Hatcher
  Labels: UpdateProcessor
 Fix For: 4.1

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request 

Make ivy search maven repo1/repo2?

2012-06-27 Thread Lance Norskog
How can I get ivy to include the maven.org repo2 in the resolver list?
Is there a reason it is not in the list?

I ask because there is an artifact (extjwnl) which is only on repo2.

-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Full build target?

2012-06-27 Thread Lance Norskog
I would like to build Lucene and have the new jars be used by Solr.
Which top-level target does this?

-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402783#comment-13402783
 ] 

Chris Male commented on LUCENE-4167:


{quote}
The stategy shouldn't care about the bbox concept, I agree. I think the bbox 
capability should be decoupled from SpatialOperation. It's not a simple matter 
of the client calling queryShape.getBoundingBox() since the expression of the 
query shape from client to server is a string. So instead of 
BBoxIntersects(Circle(3,5 d=10)) I propose supporting 
INTERSECTS(BBOX(Circle(3,5 d=10))). The actual set of operations I want to 
support are [E]CQL spatial predicates: 
http://docs.geoserver.org/latest/en/user/filter/ecql_reference.html#spatial-predicate
 but that perhaps deserves its own issue.
{quote}

I think we need to be cautious here about exposing too much complexity in the 
Strategys.  Query language requirements shouldn't be passed on down to 
Strategy.  Instead, the Strategys should have a very controlled list of spatial 
operations they support and how they are connected to the query parser should 
be the parser's responsibility.  Requiring direct users of the Strategys to use 
queryShape.getBoundingBox() seems like a good way to mitigate complexity in the 
Strategys themselves and we can then do whatever we like in any parsers to make 
our query languages work.

{quote}
Sorry, but I disagree with your point of view. The Strategy is supposed to be a 
single facade to the implementation details of how a query will work, including 
the various possible spatial predicates (i.e. spatial operations) that is 
supports. If one Java class file shows that it becomes too complicated and it 
would be better separated because implementing different predicates are just so 
fundamentally different, then then the operations could be decomposed to 
separate source files but it would be behind the facade of the Strategy.
{quote}

Okay fair enough.  I think we can come to a compromise.  My goal here is to 
make it clear to the user what operations our Strategys support at compile 
time, not through some undocumented runtime check.  That seems a recipe for 
disaster.  Imagine someone who uses one of the Prefix Strategys and then tries 
to do a Disjoint operation.  At runtime they get an error and then after some 
reading through source code they discover they actually need to use TwoDoubles 
which requires a re-index.

Instead what I recommend is that we rename makeQuery to makeIntersectsQuery.  
Then all implementations of that method will only construct a Query for the 
intersects operation.  We can then add makeXXXQuery methods to the Strategy 
interface as we add support to all the implementations.  If a Strategy impl 
supports a particular operation that the rest don't, then that can just be a 
method on that specific Strategy and not added to the Strategy interface.  
Consequently TwoDoubles will get a makeDisjointQuery method.  This way we have 
more readable code, better compile time checking and less confused users.

How we map this into any Client / Server interaction or a query language should 
be the responsibility of those classes, not the Strategys.

I'm going to create a patch to this effect.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male

 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed

2012-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-4165:
---

Attachment: LUCENE-4156-trunk.patch

Updated version of trunk patch which closes the InputStreams created in Solr's 
HunspellStemFilterFactory.

 HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
 --

 Key: LUCENE-4165
 URL: https://issues.apache.org/jira/browse/LUCENE-4165
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 3.6
 Environment: Linux, Java 1.6
Reporter: Torsten Krah
Priority: Minor
 Attachments: LUCENE-4156-trunk.patch, lucene_36.patch, 
 lucene_trunk.patch


 The HunspellDictionary takes an InputStream for affix file and a List of 
 Streams for dictionaries.
 Javadoc is not clear about i have to close those stream myself or the 
 Dictionary constructor does this already.
 Looking at the code, at least reader.close() is called when the affix file is 
 read via readAffixFile() method (although closing streams is not done in a 
 finally block - so the constructor may fail to do so).
 The readDictionaryFile() method does miss the call to close the reader in 
 contrast to readAffixFile().
 So the question here is - have i have to close the streams myself after 
 instantiating the dictionary?
 Or is the close call only missing for the dictionary streams?
 Either way, please add the close calls in a safe manner or clarify javadoc so 
 i have to do this myself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4166) TwoDoublesStrategy is broken for Circles

2012-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male resolved LUCENE-4166.


   Resolution: Fixed
Fix Version/s: 5.0
   4.0
 Assignee: Chris Male

Fixed, but we really need to look at this Strategy closely in another issue.

 TwoDoublesStrategy is broken for Circles
 

 Key: LUCENE-4166
 URL: https://issues.apache.org/jira/browse/LUCENE-4166
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Assignee: Chris Male
Priority: Critical
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4166.patch


 TwoDoublesStrategy supports finding Documents that are within a Circle, yet 
 it is impossible to provide one due to the following code found at the start 
 of TwoDoublesStrategy.makeQuery():
 {code}
 Shape shape = args.getShape();
 if (!(shape instanceof Rectangle)) {
   throw new InvalidShapeException(A rectangle is the only supported 
 shape (so far), not +shape.getClass());//TODO
 }
 Rectangle bbox = (Rectangle) shape;
 {code}
 I think instead the code which handles Circles should ask for the bounding 
 box of the Shape and uses that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys

2012-06-27 Thread Chris Male (JIRA)
Chris Male created LUCENE-4173:
--

 Summary: Remove IgnoreIncompatibleGeometry for SpatialStrategys
 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male


Silently not indexing anything for a Shape is not okay.  Users should get an 
Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys

2012-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-4173:
---

Attachment: LUCENE-4173.patch

Simple patch removing the option and improving how non-Point shapes are handled 
in TwoDoubles.

 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4173.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread Chris Male (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-4167:
---

Attachment: LUCENE-4167.patch

First shot at this.  

I completely removed SpatialArgs from the Strategy interface.  We don't have so 
many parameters that we can't force them to be defined.  

Changed makeQuery/makeFilter to makeIntersectsQuery/makeIntersectsFilter 
respectively.

I want to address the method javadocs before committing this.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4167.patch


 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled

2012-06-27 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402826#comment-13402826
 ] 

Jack Krupansky commented on SOLR-3580:
--

My recommendation is to have an additional option, lowercaseNotOperator which 
defaults to false. This would be the safe choice that Yonik recommends, but 
allow you to override that decision as you see fit for your application.

 In ExtendedDismax, lowercase 'not' operator is not being treated as an 
 operator when 'lowercaseOperators' is enabled
 

 Key: SOLR-3580
 URL: https://issues.apache.org/jira/browse/SOLR-3580
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3580.patch


 When lowercase operator support is enabled (for edismax), the lowercase 'not' 
 operator is being wrongly treated as a literal term (and not as an operator).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402827#comment-13402827
 ] 

David Smiley commented on LUCENE-4167:
--

I agree that something could/should be done to improve the awareness of exactly 
which operations a Strategy supports.  This is of course just one aspect of a 
Strategy's limitations, consider wether or not the Strategy supports 
multi-value data or wether it supports indexing non-point shapes.  Surely 
*that* is quite relevant to a potential client.  It seems very doubtful to me 
that the compile-time type checks could be added for everything.  And even with 
spatial operations -- there are a lot of them to support, and wouldn't it be 
twice as many for both makeXXXQuery  makeXXXFilter?  I don't know where you 
would draw the line.  At least the current interface is fairly simple, and 
there is always Javadocs.

That said, I look forward to seeing any patches you may having demonstrating 
what you have in mind.  Maybe I just won't get it until I see it.

bq. How we map this into any Client / Server interaction or a query language 
should be the responsibility of those classes, not the Strategies.

True.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4167.patch


 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation

2012-06-27 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402830#comment-13402830
 ] 

Chris Male commented on LUCENE-4167:


{quote}
This is of course just one aspect of a Strategy's limitations, consider wether 
or not the Strategy supports multi-value data or wether it supports indexing 
non-point shapes. Surely that is quite relevant to a potential client. It seems 
very doubtful to me that the compile-time type checks could be added for 
everything
{quote}

Quite right and we can tackle these issues on a case by case basis.  Having a 
check like supportsMultiValued() on Strategys seems like a good idea.  That way 
the user can consult this method before indexing.

{quote}
And even with spatial operations – there are a lot of them to support, and 
wouldn't it be twice as many for both makeXXXQuery  makeXXXFilter? I don't 
know where you would draw the line. At least the current interface is fairly 
simple, and there is always Javadocs.
{quote}

We don't have any useful Javadocs on this issue so I'm not going to rely on 
that.  I don't see any issue with having a makeXXXQuery/Filter for each 
operation.  Strategys are essentially factories so I think the ability to see 
at compile time what the factory can create is vitally important.  If we get to 
20 operations I'll start to worry.

 Remove the use of SpatialOperation
 --

 Key: LUCENE-4167
 URL: https://issues.apache.org/jira/browse/LUCENE-4167
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
 Attachments: LUCENE-4167.patch


 Looking at the code in TwoDoublesStrategy I noticed 
 SpatialOperations.BBoxWithin vs isWithin which confused me.  Looking over the 
 other Strategys I see that really only isWithin and Intersects is supported.  
 Only TwoDoublesStrategy supports IsDisjointTo.  The remainder of 
 SpatialOperations are not supported.
 I don't think we should use SpatialOperation as this stage since it is not 
 clear what Operations are supported by what Strategys, many Operations are 
 not supported, and the code for handling the Operations is usually the same.  
 We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a 
 different Strategy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

2012-06-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402851#comment-13402851
 ] 

Trym Møller commented on SOLR-3582:
---

Debugging the provided test shows this behaviour as well, that is, the Watch is 
kept even though, its notified about disConnection and syncConnection and the 
Watch will first stop after a node change occurs.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in 
Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the 
provided bug fix solves the problem in the LeaderElector and it can be 
committed to svn independently of problems with other watchers.

Best regards Trym

 Leader election zookeeper watcher is responding to con/discon notifications 
 incorrectly.
 

 Key: SOLR-3582
 URL: https://issues.apache.org/jira/browse/SOLR-3582
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0


 As brought up by Trym R. Møller on the mailing list, we are responding to 
 watcher events about connection/disconnection as if they were notifications 
 about node changes.
 http://www.lucidimagination.com/search/document/e13ef390b882

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

2012-06-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402851#comment-13402851
 ] 

Trym Møller edited comment on SOLR-3582 at 6/28/12 5:10 AM:


Debugging the provided test shows this behaviour as well, that is, the Watch is 
kept even though, its notified about disConnection and syncConnection and the 
Watch will first stop after it has been notified about a node change.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in 
Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the 
provided bug fix solves the problem in the LeaderElector and it can be 
committed to svn independently of problems with other watchers.

Best regards Trym

  was (Author: trym):
Debugging the provided test shows this behaviour as well, that is, the 
Watch is kept even though, its notified about disConnection and syncConnection 
and the Watch will first stop after a node change occurs.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in 
Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the 
provided bug fix solves the problem in the LeaderElector and it can be 
committed to svn independently of problems with other watchers.

Best regards Trym
  
 Leader election zookeeper watcher is responding to con/discon notifications 
 incorrectly.
 

 Key: SOLR-3582
 URL: https://issues.apache.org/jira/browse/SOLR-3582
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0


 As brought up by Trym R. Møller on the mailing list, we are responding to 
 watcher events about connection/disconnection as if they were notifications 
 about node changes.
 http://www.lucidimagination.com/search/document/e13ef390b882

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >