[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
[ https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Krah updated LUCENE-4165: - Attachment: lucene_36.patch lucene_trunk.patch Updated patches: 1. removed reader.close() call in readAffixFile() function. 2. Add comment at ctors and arguments to make clear that caller has to close the streams and that the ctor does not close them. 3. Test modified to check its actually not closed. 4. Added 2 close calls on the streams in trunk patch for the Test. HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed -- Key: LUCENE-4165 URL: https://issues.apache.org/jira/browse/LUCENE-4165 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.6 Environment: Linux, Java 1.6 Reporter: Torsten Krah Priority: Minor Attachments: lucene_36.patch, lucene_36.patch, lucene_trunk.patch, lucene_trunk.patch The HunspellDictionary takes an InputStream for affix file and a List of Streams for dictionaries. Javadoc is not clear about i have to close those stream myself or the Dictionary constructor does this already. Looking at the code, at least reader.close() is called when the affix file is read via readAffixFile() method (although closing streams is not done in a finally block - so the constructor may fail to do so). The readDictionaryFile() method does miss the call to close the reader in contrast to readAffixFile(). So the question here is - have i have to close the streams myself after instantiating the dictionary? Or is the close call only missing for the dictionary streams? Either way, please add the close calls in a safe manner or clarify javadoc so i have to do this myself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
[ https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Krah updated LUCENE-4165: - Attachment: (was: lucene_trunk.patch) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed -- Key: LUCENE-4165 URL: https://issues.apache.org/jira/browse/LUCENE-4165 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.6 Environment: Linux, Java 1.6 Reporter: Torsten Krah Priority: Minor Attachments: lucene_36.patch, lucene_trunk.patch The HunspellDictionary takes an InputStream for affix file and a List of Streams for dictionaries. Javadoc is not clear about i have to close those stream myself or the Dictionary constructor does this already. Looking at the code, at least reader.close() is called when the affix file is read via readAffixFile() method (although closing streams is not done in a finally block - so the constructor may fail to do so). The readDictionaryFile() method does miss the call to close the reader in contrast to readAffixFile(). So the question here is - have i have to close the streams myself after instantiating the dictionary? Or is the close call only missing for the dictionary streams? Either way, please add the close calls in a safe manner or clarify javadoc so i have to do this myself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
[ https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Krah updated LUCENE-4165: - Attachment: (was: lucene_36.patch) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed -- Key: LUCENE-4165 URL: https://issues.apache.org/jira/browse/LUCENE-4165 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.6 Environment: Linux, Java 1.6 Reporter: Torsten Krah Priority: Minor Attachments: lucene_36.patch, lucene_trunk.patch The HunspellDictionary takes an InputStream for affix file and a List of Streams for dictionaries. Javadoc is not clear about i have to close those stream myself or the Dictionary constructor does this already. Looking at the code, at least reader.close() is called when the affix file is read via readAffixFile() method (although closing streams is not done in a finally block - so the constructor may fail to do so). The readDictionaryFile() method does miss the call to close the reader in contrast to readAffixFile(). So the question here is - have i have to close the streams myself after instantiating the dictionary? Or is the close call only missing for the dictionary streams? Either way, please add the close calls in a safe manner or clarify javadoc so i have to do this myself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3
[ https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4138: Attachment: LUCENE-4138.patch Updated patch. Not backwards compatible (intentially): MorphosyntacticTagAttribute has been renamed to MorphosyntacticTagsAttribute (note plural) and now carries a list of tags for the current stem. Update morfologik (polish stemming) to 1.5.3 Key: LUCENE-4138 URL: https://issues.apache.org/jira/browse/LUCENE-4138 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Attachments: LUCENE-4138.patch, LUCENE-4138.patch Just released. Updates to the dictionary but most of all -- it comes with a clean BSD license (including dictionary data). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4157) Improve Spatial Testing
[ https://issues.apache.org/jira/browse/LUCENE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402063#comment-13402063 ] Chris Male commented on LUCENE-4157: {quote}RE QuadPrefixTree, I'll see if I can reproduce your test errors. I'm not surprised if the QuadPrefixTree.MAX_LEVELS_POSSIBLE is perhaps too big (notice the comment at it's declaration not really sure how big this should be. Assuming the default 12 levels pass, I think we can find a safer max number to use for the time being that is less than 50, and maybe one day when we have time we can confidently determine exactly what it can support. I venture to guess it might be similar to the mantissa of a double which is 53, but perhaps not or maybe it's half that or something. FYI about 26 is needed for ~1meter accuracy. If a non-geo scenario is needed, then who knows what your requirements might be.{quote} Thanks for that explanation. I tried with the default of 12 and the tests still failed but no error this time. That could be just related to the fact quad trees are less precise than geohashes or maybe some problems with the tests. I think we should just try to come up with some tests for the trees themselves to verify that they work as expected. I see SpatialPrefixTreeTest does some testing of GeohashPrefixTree currently, but we should really spin that off into its own test class and take QuadTree separately. {quote}RE Testing of TermQueryPrefixGridStrategy, I agree that its tests are too minimal, in Lucene spatial. FWIW, I'm about to update a patch to SOLR-3304 that tests a variety of strategies against the same test code (based on test code from Solr 3 spatial filter tests). TermQueryPrefixGridStrategy passes fine.{quote} Good to know. I have confidence in TermQueryPrefixGridStrategy since it is extremely simple but I think we need to come up with tests to ensure that any changes we make to the indexing process is compatible with the querying. {quote}I definitely welcome any input on making the tests better overall. It's a bit of a challenge because there are a variety of strategies, and some like TwoDoublesStrategy are known to not yet support certain geo cases like the poles (if I recall). I'm not sure if the idea of a test file of query cases was your idea or Ryan's (e.g. cities-IsWithin-BBox), but instead or in addition, I like the idea of automatically generating random data and queries, and then double checking search results against a simple brute force algorithm.{quote} I don't really like the test file idea at all. Having them for benchmarking is good but we aren't at that stage yet. Instead I think we should construct simple unit tests, indexing a few Shapes and querying for them. We should do that for each Strategy, obviously only indexing Points for TwoDoublesStrategy. Having random data and query generation can come later, once we have enough crafted tests to be sure that this works. We should then randomize the use of QuadTree vs GeohashTree or actually repeat the tests for both. We have a big question mark around testing with polygons. My concern is that users will rightly start using JTS Geometrys and our Strategies will fail. We really need to think about how to handle this. {quote}If you don't feel any better about these two classes, then I like your suggestion of not releasing them in 4.0 and leave in trunk.{quote} QuadTree is my main concern since I don't know whether it is working correctly and is just less precise than geohashes or has a bug. If we can't quickly come up with a couple of tests and fix any broken behavior then we should remove it from 4.0. We should also take this opportunity to remove any unused code / code that doesn't actually test anything. For this I see TruncateFilter, the current TestTermQueryPrefixGridStrategy and TestSpatialPrefixField. I'll try to help out here especially with cleaning out the dead code, but any help with testing QuadTree would be great. Improve Spatial Testing --- Key: LUCENE-4157 URL: https://issues.apache.org/jira/browse/LUCENE-4157 Project: Lucene - Java Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Priority: Critical Fix For: 4.0 Attachments: LUCENE-4157_Improve_Lucene_Spatial_testing_p1.patch Looking back at the tests for the Lucene Spatial Module, they seem half-baked. (At least Spatial4j is well tested). I've started working on some improvements: * Some tests are in an abstract base class which have a subclass that provides a SpatialContext. The idea was that the same tests could test other contexts (such as geo vs not or different distance calculators (haversine vs vincenty) but this
[jira] [Commented] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3
[ https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402069#comment-13402069 ] Dawid Weiss commented on LUCENE-4138: - If there are no objections I'll commit this shortly. Update morfologik (polish stemming) to 1.5.3 Key: LUCENE-4138 URL: https://issues.apache.org/jira/browse/LUCENE-4138 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Attachments: LUCENE-4138.patch, LUCENE-4138.patch, LUCENE-4138.patch Just released. Updates to the dictionary but most of all -- it comes with a clean BSD license (including dictionary data). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3
[ https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4138: Attachment: LUCENE-4138.patch Updated patch with minor fixes (corrected module fileset, optimized buffer reuse for tags). Update morfologik (polish stemming) to 1.5.3 Key: LUCENE-4138 URL: https://issues.apache.org/jira/browse/LUCENE-4138 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Attachments: LUCENE-4138.patch, LUCENE-4138.patch, LUCENE-4138.patch Just released. Updates to the dictionary but most of all -- it comes with a clean BSD license (including dictionary data). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4138) Update morfologik (polish stemming) to 1.5.3
[ https://issues.apache.org/jira/browse/LUCENE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-4138: Priority: Minor (was: Trivial) Update morfologik (polish stemming) to 1.5.3 Key: LUCENE-4138 URL: https://issues.apache.org/jira/browse/LUCENE-4138 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4138.patch, LUCENE-4138.patch, LUCENE-4138.patch Just released. Updates to the dictionary but most of all -- it comes with a clean BSD license (including dictionary data). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402075#comment-13402075 ] Adrien Grand commented on LUCENE-4062: -- Thanks for your patch, Toke. All tests seem to pass, I'll try to generate graphs for your impl as soon as possible! More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Labels: performance Fix For: 4.0, 5.0 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, PackedIntsBenchmark.java, PackedIntsBenchmark.java In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance improvements). I think this would allow more fine-grained control over the speed/space trade-off, what do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
[jira] [Updated] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toke Eskildsen updated LUCENE-4062: --- Attachment: measurements_te_xeon.txt measurements_te_p4.txt measurements_te_i7.txt measurements_te_graphs.pdf I ran the test on three different machines. results are attached as measurements*.txt along with a PDF with graphs generated from iteration #6 (which should probably be the mean or max of run 2-5). The setter-graph for the p4 looks extremely strange for Direct, but I tried generating a graph for iteration #5 instead and it looked the same. in the same vein, the Direct performance for the Xeon is suspiciously low, so I wonder if there's some freaky JITting happening to the test code. Unfortunately I did not find an AMD machine to test on. For the three tested Intels, it seems that the Packed64calc does perform very well. More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Labels: performance Fix For: 4.0, 5.0 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, PackedIntsBenchmark.java, PackedIntsBenchmark.java, measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, measurements_te_xeon.txt In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits
Re: VOTE: 4.0 alpha (take two)
I was actually using a solrconfig.xml that is too old for this version. catalina.out gave me some erros on indexDefaults and mainIndex, so I took the solrconfig.xml from your alpha package and it worked fine. I haven't been able to totally check everything yet because I am using a Solrj 3.6 indexing client and I had some content type issues in catalina.out. I am working on it. On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote: that seems worth an issue. is there one already? not yet, there was one comment on SOLR-3238 but no further comment. On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote: Are you aware of this error ? Thanks again. Antoine, would you mind to open one and provide some infos? This error will show up if there's a problem accessing /admin/system, i guess that should be our starting point Stefan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters
[ https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-3467: - Assignee: Jan Høydahl ExtendedDismax escaping is missing several reserved characters -- Key: SOLR-3467 URL: https://issues.apache.org/jira/browse/SOLR-3467 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Michael Dodsworth Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3467.patch, SOLR-3467.patch When edismax is unable to parse the original user query, it retries using an escaped version of that query (where all reserved chars have been escaped). Currently, the escaping done in {{splitIntoClauses}} appears to be missing several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', '', '/'}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters
[ https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3467: -- Affects Version/s: (was: 4.0) 3.6 Fix Version/s: 5.0 ExtendedDismax escaping is missing several reserved characters -- Key: SOLR-3467 URL: https://issues.apache.org/jira/browse/SOLR-3467 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Michael Dodsworth Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3467.patch, SOLR-3467.patch When edismax is unable to parse the original user query, it retries using an escaped version of that query (where all reserved chars have been escaped). Currently, the escaping done in {{splitIntoClauses}} appears to be missing several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', '', '/'}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
version checkout
Hi, I created an issue in JIRA https://issues.apache.org/jira/browse/SOLR-3574and now I want to develop/contribute. I would first like to create a patch for the Solr version 3.6.0, than also include a patch for versions 4 and 5. Is this possible or can I only create patch for last version? As much as I read from How To Contributehttp://wiki.apache.org/solr/HowToContributeand from what I can see from here https://svn.apache.org/repos/asf/lucene/dev/, there are trunk, tags, branches and nightly available. What version is the trunk? I think I shouldn't touch tags (those are final versions), nor branches (these is big functionality branched code which might differ from the trunk). Can someone please help me get up to speed with this? Cheers, despot
Re: VOTE: 4.0 alpha (take two)
Using SolrJ from the alpha, everything works. Go for it ! On Wed, Jun 27, 2012 at 12:03 PM, Antoine LE FLOC'H lefl...@gmail.comwrote: I was actually using a solrconfig.xml that is too old for this version. catalina.out gave me some erros on indexDefaults and mainIndex, so I took the solrconfig.xml from your alpha package and it worked fine. I haven't been able to totally check everything yet because I am using a Solrj 3.6 indexing client and I had some content type issues in catalina.out. I am working on it. On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote: that seems worth an issue. is there one already? not yet, there was one comment on SOLR-3238 but no further comment. On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote: Are you aware of this error ? Thanks again. Antoine, would you mind to open one and provide some infos? This error will show up if there's a problem accessing /admin/system, i guess that should be our starting point Stefan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: post alpha 1 entries in CHANGES.txt
A lightweight method is to create a TAG in SVN for each RC and also for ALPHA and BETA, and then cut a branch when releasing 4.0.0 Regarding CHANGES, it should have full-blown sections for 4.0.0-ALPHA and 4.0.0-BETA, but for e.g. 4.0.0-BETA-RC2 it should be enough with a marker line in CHANGES.TXT to indicate which issues are included before and after the RC. That way we can build and tag RC's often with little effort, and when the final release is done, these marker lines can be removed if we wish. For ALPHA/BETA I think we should also add a new section to CHANGES.TXT to highlight known major issues which are still blockers for final release. This is how it could look like: == 4.0.0-ALPHA == : : IMPORTANT: This is not a final release and we encourage you to use the latest stable release for production use. Known critical issues in this ALPHA --- * SOLR-: Index gets corrupted every monday :-) Detailed Change List -- New Features -- * SOLR-: Foo bar (myself) 4.0.0-ALPHA-RC1 includes changes above this line * SOLR-: Foo bar (myself) * SOLR-: Foo bar (myself) 4.0.0-ALPHA-RC2 includes changes above this line * SOLR-: Foo bar (myself) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 27. juni 2012, at 04:10, Mark Miller wrote: I put a new changes entry after the alpha under the 4-Alpha release in CHANGES.txt for Solr. I missed the discussion if there was one, but if we plan to have a CHANGES section for alphas and betas, let me know, and I'll move that entry when we start a new section. We should add the next section soon if we are going to so it's clear what direction we are taking. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters
[ https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3467: -- Attachment: SOLR-3467.patch Updated trunk patch with extended test and CHANGES entry. Looks good to me. Any other comments before commit? ExtendedDismax escaping is missing several reserved characters -- Key: SOLR-3467 URL: https://issues.apache.org/jira/browse/SOLR-3467 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Michael Dodsworth Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3467.patch, SOLR-3467.patch, SOLR-3467.patch When edismax is unable to parse the original user query, it retries using an escaped version of that query (where all reserved chars have been escaped). Currently, the escaping done in {{splitIntoClauses}} appears to be missing several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', '', '/'}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: version checkout
Hi, currently the trunk is going to be the version 5.0. The version 4.0 hasn't been released yet, but there is a branch (branch_4x) created for it. There is also a branch for 3.6.1 (lucene_solr_3_6 I think) that's only for bug fixes. The general way of working is providing patches for the different versions where the patch should be applied. You almost always want to apply the patch to the trunk. Some of them should also be applied to 4.0 (everything but big changes I would say) and bug fixes to 3.6. Tomás On Wed, Jun 27, 2012 at 7:12 AM, Despot Jakimovski despot.jakimov...@gmail.com wrote: Hi, I created an issue in JIRAhttps://issues.apache.org/jira/browse/SOLR-3574and now I want to develop/contribute. I would first like to create a patch for the Solr version 3.6.0, than also include a patch for versions 4 and 5. Is this possible or can I only create patch for last version? As much as I read from How To Contributehttp://wiki.apache.org/solr/HowToContributeand from what I can see from here https://svn.apache.org/repos/asf/lucene/dev/, there are trunk, tags, branches and nightly available. What version is the trunk? I think I shouldn't touch tags (those are final versions), nor branches (these is big functionality branched code which might differ from the trunk). Can someone please help me get up to speed with this? Cheers, despot
[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters
[ https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3467: -- Fix Version/s: 3.6.1 ExtendedDismax escaping is missing several reserved characters -- Key: SOLR-3467 URL: https://issues.apache.org/jira/browse/SOLR-3467 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Michael Dodsworth Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 3.6.1, 5.0 Attachments: SOLR-3467-lucene_solr_3_6.patch, SOLR-3467.patch, SOLR-3467.patch, SOLR-3467.patch When edismax is unable to parse the original user query, it retries using an escaped version of that query (where all reserved chars have been escaped). Currently, the escaping done in {{splitIntoClauses}} appears to be missing several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', '', '/'}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters
[ https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3467: -- Attachment: SOLR-3467-lucene_solr_3_6.patch This was an easy backport for 3.6.1 ExtendedDismax escaping is missing several reserved characters -- Key: SOLR-3467 URL: https://issues.apache.org/jira/browse/SOLR-3467 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Michael Dodsworth Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 3.6.1, 5.0 Attachments: SOLR-3467-lucene_solr_3_6.patch, SOLR-3467.patch, SOLR-3467.patch, SOLR-3467.patch When edismax is unable to parse the original user query, it retries using an escaped version of that query (where all reserved chars have been escaped). Currently, the escaping done in {{splitIntoClauses}} appears to be missing several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', '', '/'}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4166) TwoDoublesStrategy is broken for Circles
Chris Male created LUCENE-4166: -- Summary: TwoDoublesStrategy is broken for Circles Key: LUCENE-4166 URL: https://issues.apache.org/jira/browse/LUCENE-4166 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Priority: Critical TwoDoublesStrategy supports finding Documents that are within a Circle, yet it is impossible to provide one due to the following code found at the start of TwoDoublesStrategy.makeQuery(): {code} Shape shape = args.getShape(); if (!(shape instanceof Rectangle)) { throw new InvalidShapeException(A rectangle is the only supported shape (so far), not +shape.getClass());//TODO } Rectangle bbox = (Rectangle) shape; {code} I think instead the code which handles Circles should ask for the bounding box of the Shape and uses that instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4167) Remove the use of SpatialOperation
Chris Male created LUCENE-4167: -- Summary: Remove the use of SpatialOperation Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3581) Search Component configuration via solrconfig.xml is not working
Karl Wright created SOLR-3581: - Summary: Search Component configuration via solrconfig.xml is not working Key: SOLR-3581 URL: https://issues.apache.org/jira/browse/SOLR-3581 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Checkout and build of branches/branch_4x Reporter: Karl Wright See CONNECTORS-485. ManifoldCF search component tests that pass on 3.6 and used to pass on 4.0 fail on branches_4x Solr because the configuration information from solrconfig.xml is not being properly passed to the search component via the init() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1763) Integrate Solr Cell/Tika as an UpdateRequestProcessor
[ https://issues.apache.org/jira/browse/SOLR-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402149#comment-13402149 ] Jan Høydahl commented on SOLR-1763: --- I won't have time to look at this before october-ish, so anyone feel free to give it a shot :) Integrate Solr Cell/Tika as an UpdateRequestProcessor - Key: SOLR-1763 URL: https://issues.apache.org/jira/browse/SOLR-1763 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: extracting_request_handler, solr_cell, tika, update_request_handler From Chris Hostetter's original post in solr-dev: As someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields. Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and adds them as binary data fields and adds the other literal params as fields. Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and CSV based updates fairly trivial? -Hoss I couldn't agree more, so I decided to add it as an issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3581) Search Component configuration via solrconfig.xml is not working
[ https://issues.apache.org/jira/browse/SOLR-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated SOLR-3581: -- Description: See CONNECTORS-485. ManifoldCF search component tests that pass on 3.6 and used to pass on trunk fail on branches_4x Solr because the configuration information from solrconfig.xml is not being properly passed to the search component via the init() method. was: See CONNECTORS-485. ManifoldCF search component tests that pass on 3.6 and used to pass on 4.0 fail on branches_4x Solr because the configuration information from solrconfig.xml is not being properly passed to the search component via the init() method. Search Component configuration via solrconfig.xml is not working Key: SOLR-3581 URL: https://issues.apache.org/jira/browse/SOLR-3581 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Checkout and build of branches/branch_4x Reporter: Karl Wright See CONNECTORS-485. ManifoldCF search component tests that pass on 3.6 and used to pass on trunk fail on branches_4x Solr because the configuration information from solrconfig.xml is not being properly passed to the search component via the init() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3581) Search Component configuration via solrconfig.xml is not working
[ https://issues.apache.org/jira/browse/SOLR-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402173#comment-13402173 ] Karl Wright commented on SOLR-3581: --- Found the problem; closing this issue. Search Component configuration via solrconfig.xml is not working Key: SOLR-3581 URL: https://issues.apache.org/jira/browse/SOLR-3581 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Checkout and build of branches/branch_4x Reporter: Karl Wright See CONNECTORS-485. ManifoldCF search component tests that pass on 3.6 and used to pass on trunk fail on branches_4x Solr because the configuration information from solrconfig.xml is not being properly passed to the search component via the init() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-3581) Search Component configuration via solrconfig.xml is not working
[ https://issues.apache.org/jira/browse/SOLR-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright closed SOLR-3581. - Resolution: Not A Problem Fix Version/s: 4.0 Operator error; inadvertant change in the failing test Search Component configuration via solrconfig.xml is not working Key: SOLR-3581 URL: https://issues.apache.org/jira/browse/SOLR-3581 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Checkout and build of branches/branch_4x Reporter: Karl Wright Fix For: 4.0 See CONNECTORS-485. ManifoldCF search component tests that pass on 3.6 and used to pass on trunk fail on branches_4x Solr because the configuration information from solrconfig.xml is not being properly passed to the search component via the init() method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4168) Allow storing test execution statistics in an external file
Dawid Weiss created LUCENE-4168: --- Summary: Allow storing test execution statistics in an external file Key: LUCENE-4168 URL: https://issues.apache.org/jira/browse/LUCENE-4168 Project: Lucene - Java Issue Type: Test Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Override on the build server to calculate stats during runs, then update the cache in the repo from time to time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1856) In Solr Cell, literals should override Tika-parsed values
[ https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-1856. --- Resolution: Fixed Committed to trunk r1354455 and branch_4x r1354460 In Solr Cell, literals should override Tika-parsed values - Key: SOLR-1856 URL: https://issues.apache.org/jira/browse/SOLR-1856 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Chris Harris Assignee: Jan Høydahl Fix For: 4.0, 5.0 Attachments: SOLR-1856.patch, SOLR-1856.patch I propose that ExtractingRequestHandler / SolrCell literals should take precedence over Tika-parsed metadata in all situations, including where multiValued=true. (Compare SOLR-1633?) My personal motivation is that I have several fields (e.g. title, date) where my own metadata is much superior to what Tika offers, and I want to throw those Tika values away. (I actually wouldn't mind throwing away _all_ Tika-parsed values, but let's set that aside.) SOLR-1634 is one potential approach to this, but the fix here might be simpler. I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-1634) change order of field operations in SolrCell
[ https://issues.apache.org/jira/browse/SOLR-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl closed SOLR-1634. - Resolution: Duplicate Marking as duplicate of SOLR-1856 which is fixed. Also, note that as a workaround this works: fmap.title=tika_titleliteral.title=HelloWorld - where the Tika-parsed title will first be moved to a new field and then accept the literal one. change order of field operations in SolrCell Key: SOLR-1634 URL: https://issues.apache.org/jira/browse/SOLR-1634 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Hoss Man As noted on the mailing list, SolrCell evaluates fmap.* params AFTER literal.* params. This makes it impossible for users to map tika produced fields to other names (possibly for the purpose of ignoring them completely) and then using literal to provide explicit values for those fields. At first glance this seems like a bug, except that it is explicitly documented... http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations ...so i'm opening this as an Improvement. We should either consider changing the order of operations, or find some other way to support what seems like a very common use case... http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-1633) Solr Cell should be smarter about literal and multiValued=false
[ https://issues.apache.org/jira/browse/SOLR-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl closed SOLR-1633. - Resolution: Duplicate Solved in SOLR-1856 Solr Cell should be smarter about literal and multiValued=false - Key: SOLR-1633 URL: https://issues.apache.org/jira/browse/SOLR-1633 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Hoss Man As noted on solr-user, SolrCell has less then ideal behavior when foo is a single value field, and literal.foo=bar is specified in the request, but Tika also produces a value for the foo field from the document. It seems like a possible improvement here would be for SolrCell to ignore the value from Tika if it already has one that was explicitly provided (as opposed to the current behavior of letting hte add fail because of multiple values in a single valued field). It seems pretty clear that in cases like this, the users intention is to have their one literal field used as the value. http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4168) Allow storing test execution statistics in an external file
[ https://issues.apache.org/jira/browse/LUCENE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-4168. - Resolution: Fixed Allow storing test execution statistics in an external file --- Key: LUCENE-4168 URL: https://issues.apache.org/jira/browse/LUCENE-4168 Project: Lucene - Java Issue Type: Test Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 4.0, 5.0 Override on the build server to calculate stats during runs, then update the cache in the repo from time to time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: version checkout
Thanks a bunch for the input! So I guess there is no way of me doing a patch for 3.6.0 since the issue in JIRA https://issues.apache.org/jira/browse/SOLR-3574 I reported is not a bug (but a new feature). Ok. Than, I'll do a patch for the 4.0 (branch_4xhttps://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/) and 5.0 (trunk). This probably will require me to change the Affected version and Fix version on the issuehttps://issues.apache.org/jira/browse/SOLR-3574 . Thanks again, despot On Wed, Jun 27, 2012 at 12:59 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Hi, currently the trunk is going to be the version 5.0. The version 4.0 hasn't been released yet, but there is a branch (branch_4x) created for it. There is also a branch for 3.6.1 (lucene_solr_3_6 I think) that's only for bug fixes. The general way of working is providing patches for the different versions where the patch should be applied. You almost always want to apply the patch to the trunk. Some of them should also be applied to 4.0 (everything but big changes I would say) and bug fixes to 3.6. Tomás On Wed, Jun 27, 2012 at 7:12 AM, Despot Jakimovski despot.jakimov...@gmail.com wrote: Hi, I created an issue in JIRAhttps://issues.apache.org/jira/browse/SOLR-3574and now I want to develop/contribute. I would first like to create a patch for the Solr version 3.6.0, than also include a patch for versions 4 and 5. Is this possible or can I only create patch for last version? As much as I read from How To Contributehttp://wiki.apache.org/solr/HowToContributeand from what I can see from here https://svn.apache.org/repos/asf/lucene/dev/, there are trunk, tags, branches and nightly available. What version is the trunk? I think I shouldn't touch tags (those are final versions), nor branches (these is big functionality branched code which might differ from the trunk). Can someone please help me get up to speed with this? Cheers, despot
[jira] [Updated] (LUCENE-4166) TwoDoublesStrategy is broken for Circles
[ https://issues.apache.org/jira/browse/LUCENE-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4166: --- Attachment: LUCENE-4166.patch Simple patch extending runtime type checking to Circle and using the Shape bounding box. TwoDoublesStrategy is broken for Circles Key: LUCENE-4166 URL: https://issues.apache.org/jira/browse/LUCENE-4166 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Priority: Critical Attachments: LUCENE-4166.patch TwoDoublesStrategy supports finding Documents that are within a Circle, yet it is impossible to provide one due to the following code found at the start of TwoDoublesStrategy.makeQuery(): {code} Shape shape = args.getShape(); if (!(shape instanceof Rectangle)) { throw new InvalidShapeException(A rectangle is the only supported shape (so far), not +shape.getClass());//TODO } Rectangle bbox = (Rectangle) shape; {code} I think instead the code which handles Circles should ask for the bounding box of the Shape and uses that instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: 4.0 alpha (take two)
But what about feature all my favorites here. Which means I haven't gotten off my butt and moved them forward. Waiting for the elves isn't working G +1, go for it. We can't continue waiting for the elves, they unionized and are demanding some time off for sleep Erick On Wed, Jun 27, 2012 at 6:40 AM, Antoine LE FLOC'H lefl...@gmail.com wrote: Using SolrJ from the alpha, everything works. Go for it ! On Wed, Jun 27, 2012 at 12:03 PM, Antoine LE FLOC'H lefl...@gmail.com wrote: I was actually using a solrconfig.xml that is too old for this version. catalina.out gave me some erros on indexDefaults and mainIndex, so I took the solrconfig.xml from your alpha package and it worked fine. I haven't been able to totally check everything yet because I am using a Solrj 3.6 indexing client and I had some content type issues in catalina.out. I am working on it. On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote: that seems worth an issue. is there one already? not yet, there was one comment on SOLR-3238 but no further comment. On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote: Are you aware of this error ? Thanks again. Antoine, would you mind to open one and provide some infos? This error will show up if there's a problem accessing /admin/system, i guess that should be our starting point Stefan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402201#comment-13402201 ] Adrien Grand commented on LUCENE-4062: -- Thanks for sharing your results. Here are mines: http://people.apache.org/~jpountz/packed_ints_calc.html (E5500 @ 2.80GHz, java 1.7.0_02, hotspot build 22.0-b10). Funny to see those little bumps when the number of bits per value is 8, 16, 32 or 64 (24 as well, although it is smaller)! It is not clear whether this impl is faster or slower than the single-block impl (or even the 3 blocks impl, I am happily surprised by the read throughput on the intel 4 machine) depending on the hardware. However, this new impl seems to be consistently better than the actual Packed64 class so I think we should replace it with your new impl. What do you think? Can you write a patch? More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Labels: performance Fix For: 4.0, 5.0 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, PackedIntsBenchmark.java, PackedIntsBenchmark.java, measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, measurements_te_xeon.txt In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value
[jira] [Comment Edited] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402201#comment-13402201 ] Adrien Grand edited comment on LUCENE-4062 at 6/27/12 12:58 PM: Thanks for sharing your results. Here are mines: http://people.apache.org/~jpountz/packed_ints_calc.html (E5500 @ 2.80GHz, java 1.7.0_02, hotspot build 22.0-b10). Funny to see those little bumps when the number of bits per value is 8, 16, 32 or 64 (24 as well, although it is smaller)! It is not clear whether this impl is faster or slower than the single-block impl (or even the 3 blocks impl, I am happily surprised by the read throughput on the intel 4 machine) depending on the hardware. However, this new impl seems to be consistently better than the current Packed64 class so I think we should replace it with your new impl. What do you think? Can you write a patch? was (Author: jpountz): Thanks for sharing your results. Here are mines: http://people.apache.org/~jpountz/packed_ints_calc.html (E5500 @ 2.80GHz, java 1.7.0_02, hotspot build 22.0-b10). Funny to see those little bumps when the number of bits per value is 8, 16, 32 or 64 (24 as well, although it is smaller)! It is not clear whether this impl is faster or slower than the single-block impl (or even the 3 blocks impl, I am happily surprised by the read throughput on the intel 4 machine) depending on the hardware. However, this new impl seems to be consistently better than the actual Packed64 class so I think we should replace it with your new impl. What do you think? Can you write a patch? More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Labels: performance Fix For: 4.0, 5.0 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, PackedIntsBenchmark.java, PackedIntsBenchmark.java, measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, measurements_te_xeon.txt In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32
Re: VOTE: 4.0 alpha (take two)
Yeah, I've got a bunch of outstanding ideas/JIRA's myself, but none of which affect the index format or anything low-level. I'm assuming that we'll allow a whole bunch of non-index-breaking things to occur after alpha is released. So here's my +1: I've been using trunk for quite a while now in a number of scenarios. Erik On Jun 27, 2012, at 08:53 , Erick Erickson wrote: But what about feature all my favorites here. Which means I haven't gotten off my butt and moved them forward. Waiting for the elves isn't working G +1, go for it. We can't continue waiting for the elves, they unionized and are demanding some time off for sleep Erick On Wed, Jun 27, 2012 at 6:40 AM, Antoine LE FLOC'H lefl...@gmail.com wrote: Using SolrJ from the alpha, everything works. Go for it ! On Wed, Jun 27, 2012 at 12:03 PM, Antoine LE FLOC'H lefl...@gmail.com wrote: I was actually using a solrconfig.xml that is too old for this version. catalina.out gave me some erros on indexDefaults and mainIndex, so I took the solrconfig.xml from your alpha package and it worked fine. I haven't been able to totally check everything yet because I am using a Solrj 3.6 indexing client and I had some content type issues in catalina.out. I am working on it. On Wed, Jun 27, 2012 at 7:29 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: On Tuesday, June 26, 2012 at 8:30 PM, Simon Willnauer wrote: that seems worth an issue. is there one already? not yet, there was one comment on SOLR-3238 but no further comment. On Tuesday, June 26, 2012 at 8:17 PM, Antoine LE FLOC'H wrote: Are you aware of this error ? Thanks again. Antoine, would you mind to open one and provide some infos? This error will show up if there's a problem accessing /admin/system, i guess that should be our starting point Stefan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: 4.0 alpha (take two)
On Jun 27, 2012, at 9:12 AM, Erik Hatcher wrote: I'm assuming that we'll allow a whole bunch of non-index-breaking things to occur after alpha is released. Yup - I certainly have a few things to do for a Solr 4 still. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: 4.0 alpha (take two)
what new features are added to 4.0 alpha? which is not finished for 4.0 final release? 在 2012-6-26 凌晨5:28,Robert Muir rcm...@gmail.com写道: artifacts are here: http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0aRC1-rev1353699/ Here is my +1 -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4169) Mark Spatial module classes as experimental
Chris Male created LUCENE-4169: -- Summary: Mark Spatial module classes as experimental Key: LUCENE-4169 URL: https://issues.apache.org/jira/browse/LUCENE-4169 Project: Lucene - Java Issue Type: Task Components: modules/spatial Reporter: Chris Male The more I dive into this code the more I worry about it, so I think we should give ourselves some leeway to make API changes as part of improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1770) move default example core config/data into a collection1 folder
[ https://issues.apache.org/jira/browse/SOLR-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402217#comment-13402217 ] Mark Miller commented on SOLR-1770: --- I just did my best to backport this to the 4 branch. move default example core config/data into a collection1 folder --- Key: SOLR-1770 URL: https://issues.apache.org/jira/browse/SOLR-1770 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 4.0, 5.0 Attachments: SOLR-1770.patch This is a better starting point for adding more cores - perhaps we can also get rid of multi-core example -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: 4.0 alpha (take two)
On Wed, Jun 27, 2012 at 9:14 AM, Mark Miller markrmil...@gmail.com wrote: On Jun 27, 2012, at 9:12 AM, Erik Hatcher wrote: I'm assuming that we'll allow a whole bunch of non-index-breaking things to occur after alpha is released. Yup - I certainly have a few things to do for a Solr 4 still. This is absolutely the intent here: supporting the lucene index format like a real release might be enough for many folks that would otherwise be scared of trunk to try this out. So we should keep adding features and breaking apis without fear. We should even continue making improvements to the index format (but in a backwards compatible way) -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-4.x - Build # 157 - Failure
Hmm, I suspect this is a bug in the position length implementation of CommonGramsFilter. This filter inserts additional tokens (bigrams) around stopwords, so if you have this is a test it will create this this_is is is_a a a_test and so on, so it can be viewed as a conditional shinglefilter. But it hardcodes the length as posLenAttribute.setPositionLength(2); // bigram If the input is already a graph (posLen != 1), then this will be incorrect. How does ShingleFilter handle this situation? Would be nice if we can fix this without capturing state or slowing it down On Sat, Jun 23, 2012 at 7:47 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/157/ 1 tests failed. REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChains Error Message: last stage: inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i i u f d s s j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d s s i u f d s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f u f d u f d s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j g n 1 f f d f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1 Stack Trace: java.lang.IllegalStateException: last stage: inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i i u f d s s j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d s s i u f d s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f u f d u f d s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j g n 1 f f d f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1 at __randomizedtesting.SeedInfo.seed([12635ABB4F789F2A:2F8273DA086A82EA]:0) at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:644) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:554) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:450) at org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:860) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at
[jira] [Created] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams
Robert Muir created LUCENE-4170: --- Summary: TestRandomChains fail with Shingle+CommonGrams Key: LUCENE-4170 URL: https://issues.apache.org/jira/browse/LUCENE-4170 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1 This test has two shinglefilters, then a common-grams filter. I think posLen impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams
[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4170: Attachment: LUCENE-4170.patch first stab a patch for commongrams' posLen. But, the test still fails. So either my patch is wrong, or we need to fix shingle, too. We could use some standalone tests here as well. TestRandomChains fail with Shingle+CommonGrams -- Key: LUCENE-4170 URL: https://issues.apache.org/jira/browse/LUCENE-4170 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4170.patch ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1 This test has two shinglefilters, then a common-grams filter. I think posLen impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams
[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402256#comment-13402256 ] Robert Muir commented on LUCENE-4170: - I think shingles has a similar bug: it doesn't look at the existing posLength of the input tokens at all, instead it just fills posLength with the builtGramSize. TestRandomChains fail with Shingle+CommonGrams -- Key: LUCENE-4170 URL: https://issues.apache.org/jira/browse/LUCENE-4170 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4170.patch ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1 This test has two shinglefilters, then a common-grams filter. I think posLen impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402274#comment-13402274 ] David Smiley commented on LUCENE-4167: -- I agree with your complaint. The only two supported operations are: * Intersects -- equivalent to IsWithin when index data is points * BBoxWIntersects -- again, equivalent to BBoxIsWithin when the indexed data is points. The distinction of overlaps with intersects seems dubious. The bbox handling is universally handled in SpatialArgs.getShape() which checks the operation and returns the wrapping rectangle. So effectively the strategies need not even bother with the whole SpatialOperation concept, at least not at the moment. My concern with your suggestion to remove SpatialOperation is that I do think it will return. I know I want to work on an IsWithin when indexed data is shapes with area. And it is serving the purpose of SpatialArgsParser parsing out the operation you want to do, which I don't think should go away (i.e. the query string shouldn't assume an intersect, it should include Intersects(...) Perhaps the unsupported operations could be commented out? Separately, I think com.spatial4j.core.query.* belongs in Lucene spatial. It's not used by any of the rest of Spatial4j, yet it's tightly related to the concept of querying which is Lucene spatial's business, and is not the business of Spatial4j. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402279#comment-13402279 ] Toke Eskildsen commented on LUCENE-4062: Making Packed64calc the new Packed64 seems like a safe bet. I'd be happy to create a patch for it. Should I open a new issue or add the patch here? If I do it here, how do we avoid confusing the original fine-grained-oriented patch from the Packed64 replacement? I think it it hard to see a clear pattern as to which Mutable implementation should be selected for the different size bpv-requirements, with the current available measurements. I'll perform some more experiments with JRE1.6/JRE1.7 on different hardware and see if the picture gets clearer. More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Labels: performance Fix For: 4.0, 5.0 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, PackedIntsBenchmark.java, PackedIntsBenchmark.java, measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, measurements_te_xeon.txt In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene
[jira] [Commented] (LUCENE-4062) More fine-grained control over the packed integer implementation that is chosen
[ https://issues.apache.org/jira/browse/LUCENE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402283#comment-13402283 ] Adrien Grand commented on LUCENE-4062: -- Yes, a new issue will make things clearer. Thanks, Toke! More fine-grained control over the packed integer implementation that is chosen --- Key: LUCENE-4062 URL: https://issues.apache.org/jira/browse/LUCENE-4062 Project: Lucene - Java Issue Type: Improvement Components: core/other Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Labels: performance Fix For: 4.0, 5.0 Attachments: LUCENE-4062-2.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, LUCENE-4062.patch, Packed64calc.java, PackedIntsBenchmark.java, PackedIntsBenchmark.java, measurements_te_graphs.pdf, measurements_te_i7.txt, measurements_te_p4.txt, measurements_te_xeon.txt In order to save space, Lucene has two main PackedInts.Mutable implentations, one that is very fast and is based on a byte/short/integer/long array (Direct*) and another one which packs bits in a memory-efficient manner (Packed*). The packed implementation tends to be much slower than the direct one, which discourages some Lucene components to use it. On the other hand, if you store 21 bits integers in a Direct32, this is a space loss of (32-21)/32=35%. If you accept to trade some space for speed, you could store 3 of these 21 bits integers in a long, resulting in an overhead of 1/3 bit per value. One advantage of this approach is that you never need to read more than one block to read or write a value, so this can be significantly faster than Packed32 and Packed64 which always need to read/write two blocks in order to avoid costly branches. I ran some tests, and for 1000 21 bits values, this implementation takes less than 2% more space and has 44% faster writes and 30% faster reads. The 12 bits version (5 values per block) has the same performance improvement and a 6% memory overhead compared to the packed implementation. In order to select the best implementation for a given integer size, I wrote the {{PackedInts.getMutable(valueCount, bitsPerValue, acceptableOverheadPerValue)}} method. This method select the fastest implementation that has less than {{acceptableOverheadPerValue}} wasted bits per value. For example, if you accept an overhead of 20% ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty reasonable, here is what implementations would be selected: * 1: Packed64SingleBlock1 * 2: Packed64SingleBlock2 * 3: Packed64SingleBlock3 * 4: Packed64SingleBlock4 * 5: Packed64SingleBlock5 * 6: Packed64SingleBlock6 * 7: Direct8 * 8: Direct8 * 9: Packed64SingleBlock9 * 10: Packed64SingleBlock10 * 11: Packed64SingleBlock12 * 12: Packed64SingleBlock12 * 13: Packed64 * 14: Direct16 * 15: Direct16 * 16: Direct16 * 17: Packed64 * 18: Packed64SingleBlock21 * 19: Packed64SingleBlock21 * 20: Packed64SingleBlock21 * 21: Packed64SingleBlock21 * 22: Packed64 * 23: Packed64 * 24: Packed64 * 25: Packed64 * 26: Packed64 * 27: Direct32 * 28: Direct32 * 29: Direct32 * 30: Direct32 * 31: Direct32 * 32: Direct32 * 33: Packed64 * 34: Packed64 * 35: Packed64 * 36: Packed64 * 37: Packed64 * 38: Packed64 * 39: Packed64 * 40: Packed64 * 41: Packed64 * 42: Packed64 * 43: Packed64 * 44: Packed64 * 45: Packed64 * 46: Packed64 * 47: Packed64 * 48: Packed64 * 49: Packed64 * 50: Packed64 * 51: Packed64 * 52: Packed64 * 53: Packed64 * 54: Direct64 * 55: Direct64 * 56: Direct64 * 57: Direct64 * 58: Direct64 * 59: Direct64 * 60: Direct64 * 61: Direct64 * 62: Direct64 Under 32 bits per value, only 13, 17 and 22-26 bits per value would still choose the slower Packed64 implementation. Allowing a 50% overhead would prevent the packed implementation to be selected for bits per value under 32. Allowing an overhead of 32 bits per value would make sure that a Direct* implementation is always selected. Next steps would be to: * make lucene components use this {{getMutable}} method and let users decide what trade-off better suits them, * write a Packed32SingleBlock implementation if necessary (I didn't do it because I have no 32-bits computer to test the performance improvements). I think this would allow more fine-grained control over the speed/space trade-off, what do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402286#comment-13402286 ] Chris Male commented on LUCENE-4167: {quote} Intersects – equivalent to IsWithin when index data is points BBoxWIntersects – again, equivalent to BBoxIsWithin when the indexed data is points. {quote} I don't see the need to differentiate BBoxIntersects and Intersects. If the user wants to find those Documents related to the bounding box of a Shape, then they can call shape.getBoundingBox() and pass that into the Strategy. The Strategys shouldn't have to worry about the Shape (although TwoDoubles does but that needs to be re-thought separately). The Strategys should just take the Shape given and roll with it. Is that what you're suggesting? {quote} My concern with your suggestion to remove SpatialOperation is that I do think it will return. I know I want to work on an IsWithin when indexed data is shapes with area. And it is serving the purpose of SpatialArgsParser parsing out the operation you want to do, which I don't think should go away (i.e. the query string shouldn't assume an intersect, it should include Intersects(...) Perhaps the unsupported operations could be commented out? {quote} I can see the need for different behaviour for different Shape relationships to. But I think we should perhaps do that using method specialization. We already have the PrefixTreeStrategy abstraction, so you could write a WithinRecursivePrefixTreeStrategy which specialized makeQuery differently. That way it is clear to the user what the Strategy does, we won't need the runtime checks and we won't have Strategys like TwoDoubles which has methods for each of the different behaviours in the same class. So I think we can remove the need for SpatialOperation now and support the idea differently in the future. (As a side note, this actually makes me think we should decouple the indexing code of Strategys from the querying code). {quote} Separately, I think com.spatial4j.core.query.* belongs in Lucene spatial. It's not used by any of the rest of Spatial4j, yet it's tightly related to the concept of querying which is Lucene spatial's business, and is not the business of Spatial4j. {quote} +1. As a short term solution I think we just replicate the code that we need in Lucene now and then drop it from Spatial4J in the next release. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3467) ExtendedDismax escaping is missing several reserved characters
[ https://issues.apache.org/jira/browse/SOLR-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402287#comment-13402287 ] Michael Dodsworth commented on SOLR-3467: - Thank you, Jan. From what I can tell, '/' only became a reserved character since 4.0 - https://issues.apache.org/jira/browse/LUCENE-2604. ExtendedDismax escaping is missing several reserved characters -- Key: SOLR-3467 URL: https://issues.apache.org/jira/browse/SOLR-3467 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Michael Dodsworth Assignee: Jan Høydahl Priority: Minor Fix For: 4.0, 3.6.1, 5.0 Attachments: SOLR-3467-lucene_solr_3_6.patch, SOLR-3467.patch, SOLR-3467.patch, SOLR-3467.patch When edismax is unable to parse the original user query, it retries using an escaped version of that query (where all reserved chars have been escaped). Currently, the escaping done in {{splitIntoClauses}} appears to be missing several chars from {{QueryParserBase#escape(String)}}, namely {{'\\', '|', '', '/'}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled
[ https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402303#comment-13402303 ] Yonik Seeley commented on SOLR-3580: This is by design. Treating and and or as operators when people may not realize they are is much less catastrophic than treating not as an operator. If someone searches for to be or not to be excluding all documents with to in them is very bad. In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled Key: SOLR-3580 URL: https://issues.apache.org/jira/browse/SOLR-3580 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3580.patch When lowercase operator support is enabled (for edismax), the lowercase 'not' operator is being wrongly treated as a literal term (and not as an operator). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams
[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402307#comment-13402307 ] Steven Rowe commented on LUCENE-4170: - bq. I think shingles has a similar bug: it doesn't look at the existing posLength of the input tokens at all, instead it just fills posLength with the builtGramSize. I agree. However, the problem isn't just position length: ShingleFilter has never handled input position increments of zero, so real graph compatibility will mean fixing that too. I think Karl Wettin's ShingleMatrixFilter (deprecated in 3.6, dropped in 4.0) is an attempt to permute all combinations of overlapping (poslength=1) terms to produce shingles. ShingleMatrixFilter wouldn't handle poslength 1, though. I'm not even sure what token ngramming should mean over an input graph. The trivial case where input tokens' poslength is always zero and position increment is always one is obviously already handled. I think both issues should be handled, since poslength 1 will very likely be used with posincr = 0, e.g. synonyms and kuromoji de-compounding. TestRandomChains fail with Shingle+CommonGrams -- Key: LUCENE-4170 URL: https://issues.apache.org/jira/browse/LUCENE-4170 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4170.patch ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1 This test has two shinglefilters, then a common-grams filter. I think posLen impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams
[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402314#comment-13402314 ] Steven Rowe commented on LUCENE-4170: - bq. I'm not even sure what token ngramming should mean over an input graph. A thought problem: run ShingleFilter with mingramsize=2, maxgramsize=3, outputUnigrams=true over input {{\[a/1] \[b/1] \[c/1] \[d/1]}} (where {{/n}} indicates poslength = {{n}}, and {{\[a b]}} indicates tokens {{a}} and {{b}} are at the same position; I'll omit the {{\[]}}'s below when only one token is at a given position), then run ShingleFilter again with the same config over the first ShingleFilter's output: {noformat} shinglefilter(min:2,max:3,unigrams:true) with input: a/1 b/1 c/1 d/1 _ token sep: [a/1 a_b/2 a_b_c/3] [b/1 b_c/2 b_c_d/3] [c/1 c_d/2] d/1 shinglefilter(2,3,unigrams) with shinglefilter output above as input: = token sep: [a/1 a_b/2 a_b_c/3 a=b/2 a=b_c/3 a=b_c_d/4 a=b=c/3 a=b=c_d/4 a=b_c=d/4 a_b=c/3 a_b=c_d/4 a_b=c=d/4 a_b_c=d/4] [b/1 b_c/2 b_c_d/3 b=c/2 b=c_d/3 b_c=d/3] [c/1 c_d/2 c=d/2] d/1 {noformat} TestRandomChains fail with Shingle+CommonGrams -- Key: LUCENE-4170 URL: https://issues.apache.org/jira/browse/LUCENE-4170 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4170.patch ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1 This test has two shinglefilters, then a common-grams filter. I think posLen impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled
[ https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402318#comment-13402318 ] Michael Dodsworth commented on SOLR-3580: - surely that's a more general hazard with supporting lowercase operators. It seems strange to give 'not' special treatment. There are likely are examples where having 'and' or 'or' wrongly treated as a operator /is/ catastrophic, therefore the onus should be on the client to choose the correct 'lowercaseOperator' option for their use-case. In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled Key: SOLR-3580 URL: https://issues.apache.org/jira/browse/SOLR-3580 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3580.patch When lowercase operator support is enabled (for edismax), the lowercase 'not' operator is being wrongly treated as a literal term (and not as an operator). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled
[ https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402321#comment-13402321 ] Yonik Seeley commented on SOLR-3580: edismax is about heuristics and sometimes guessing user intent... if exact/strict syntax is desired, the lucene query parser is a better fit. In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled Key: SOLR-3580 URL: https://issues.apache.org/jira/browse/SOLR-3580 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3580.patch When lowercase operator support is enabled (for edismax), the lowercase 'not' operator is being wrongly treated as a literal term (and not as an operator). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4080: - Attachment: LUCENE-4080.patch Patch. The {{SegmentReader}} returned by {{getMergeReader}} now has a correct {{numDeletedDocuments()}} and {{getLiveDocs()}}. Could someone familiar with Lucene merging internals review this patch? SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs -- Key: LUCENE-4080 URL: https://issues.apache.org/jira/browse/LUCENE-4080 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1 Reporter: Adrien Grand Priority: Trivial Fix For: 4.1 Attachments: LUCENE-4080.patch At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs. From LUCENE-2357: bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for // Must sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount... bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue. bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result (SegmentReader.numDeletedDocs can always be trusted) would be cleaner... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled
[ https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402350#comment-13402350 ] Michael Dodsworth commented on SOLR-3580: - were we not allowing the user to explicitly *specify* that they want to support lowercase operators, I might agree. That setting should (at the very least) come with a clear health warning so that more people aren't caught out by this. In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled Key: SOLR-3580 URL: https://issues.apache.org/jira/browse/SOLR-3580 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3580.patch When lowercase operator support is enabled (for edismax), the lowercase 'not' operator is being wrongly treated as a literal term (and not as an operator). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402353#comment-13402353 ] Robert Muir commented on LUCENE-4080: - I think its cleaner not to have the 'if numDocs = 0' in SegmentReader ctor#2 Instead i think ctor #1 should just forward docCount - delCount like ctor#3 SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs -- Key: LUCENE-4080 URL: https://issues.apache.org/jira/browse/LUCENE-4080 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1 Reporter: Adrien Grand Priority: Trivial Fix For: 4.1 Attachments: LUCENE-4080.patch At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs. From LUCENE-2357: bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for // Must sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount... bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue. bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result (SegmentReader.numDeletedDocs can always be trusted) would be cleaner... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3574) Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions
[ https://issues.apache.org/jira/browse/SOLR-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Despot Jakimovski updated SOLR-3574: Fix Version/s: (was: 3.6) 5.0 4.1 4.0 Affects Version/s: (was: 3.6) 5.0 4.1 4.0 Create a Compound Word Filter (and Factory) extension that will allow support for (word) exceptions --- Key: SOLR-3574 URL: https://issues.apache.org/jira/browse/SOLR-3574 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 4.0, 4.1, 5.0 Reporter: Despot Jakimovski Labels: compound-word, dictionary, feature, filter, word-exception Fix For: 4.0, 4.1, 5.0 Original Estimate: 72h Remaining Estimate: 72h When having the following use case: We have 2 words penslot and knoppen. One of them presents a compound word (penslot), the other one is a plural form of knop. When using the compound word filter, if we place the words pen slot and knop in the dictionary, for a search containing knoppen, we get results containing pen also, which shouldn't be the case, because knoppen is only a plural form (not a compound word). We need another dictionary to specify the words that are exceptions to the filter (like in this case knoppen). In this case, the filter would find compound words containing pen slot and knop, but will leave out dividing knoppen and searching on its parts. More info on the subject: http://stackoverflow.com/questions/11159839/can-we-make-the-compound-word-filter-not-divide-some-words-in-solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402365#comment-13402365 ] Robert Muir commented on LUCENE-4080: - Also is it ok in mergeMiddle that we call rld.getMergeReader inside the sync? Previously, we never did actual i/o here... SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs -- Key: LUCENE-4080 URL: https://issues.apache.org/jira/browse/LUCENE-4080 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1 Reporter: Adrien Grand Priority: Trivial Fix For: 4.1 Attachments: LUCENE-4080.patch At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs. From LUCENE-2357: bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for // Must sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount... bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue. bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result (SegmentReader.numDeletedDocs can always be trusted) would be cleaner... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
JIRA assignee options
Hi, I would like to assign myself to the JIRA taskhttps://issues.apache.org/jira/browse/SOLR-3574, but I cannot find the Assign to button. Probably I am missing some privileges. Can someone help me fix this? Cheers, despot
[jira] [Created] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
Mark Miller created SOLR-3582: - Summary: Leader election zookeeper watcher is responding to con/discon notifications incorrectly. Key: SOLR-3582 URL: https://issues.apache.org/jira/browse/SOLR-3582 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes. http://www.lucidimagination.com/search/document/e13ef390b882 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
[ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402388#comment-13402388 ] Mark Miller commented on SOLR-3582: --- I'm unsure of the proposed solution on the mailing list. On a connection event, the watch will fire - we will skip doing anything, but watches are one time events, so we will have no watch in place? Leader election zookeeper watcher is responding to con/discon notifications incorrectly. Key: SOLR-3582 URL: https://issues.apache.org/jira/browse/SOLR-3582 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes. http://www.lucidimagination.com/search/document/e13ef390b882 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3573) Data import does not free CLOB
[ https://issues.apache.org/jira/browse/SOLR-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402393#comment-13402393 ] Bjorn Hijmans commented on SOLR-3573: - Some more information, for me this started to happen after we started storing XMLTYPE data as binary instead of CLOB. I managed to fix it by casting the java.sql.Clob to a oracle.sql.CLOB so I could use freeTemporary() to free the clob. Not an acceptable solution to commit though. Not sure if this is a solr problem, a JDBC problem or an oracle problem. Data import does not free CLOB -- Key: SOLR-3573 URL: https://issues.apache.org/jira/browse/SOLR-3573 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Environment: Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode, sharing), oracle 11.2.0.3.0, Solr-trunk Reporter: Bjorn Hijmans Attachments: oracle_clob_freetemporary.diff When selecting a CLOB in the deltaImportQuery, the CLOB will not be freed which will cause the Oracle process to use up all memory on the Oracle server. I'm not very good at java, but I think changes need to be made in FieldReaderDataSource.java. In the getData method, the characterStream from the Clob needs to be copied to a new stream, so the clob can be freed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3573) Data import does not free CLOB
[ https://issues.apache.org/jira/browse/SOLR-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjorn Hijmans updated SOLR-3573: Attachment: oracle_clob_freetemporary.diff Data import does not free CLOB -- Key: SOLR-3573 URL: https://issues.apache.org/jira/browse/SOLR-3573 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Environment: Java HotSpot(TM) Client VM (build 21.0-b17, mixed mode, sharing), oracle 11.2.0.3.0, Solr-trunk Reporter: Bjorn Hijmans Attachments: oracle_clob_freetemporary.diff When selecting a CLOB in the deltaImportQuery, the CLOB will not be freed which will cause the Oracle process to use up all memory on the Oracle server. I'm not very good at java, but I think changes need to be made in FieldReaderDataSource.java. In the getData method, the characterStream from the Clob needs to be copied to a new stream, so the clob can be freed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
[ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402398#comment-13402398 ] Mark Miller commented on SOLR-3582: --- Never mind - found confirmation elsewhere that session events do not remove the watcher. The ZooKeeper programming guide does not appear very clear on this when it talks about watches being one time triggers. Leader election zookeeper watcher is responding to con/discon notifications incorrectly. Key: SOLR-3582 URL: https://issues.apache.org/jira/browse/SOLR-3582 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes. http://www.lucidimagination.com/search/document/e13ef390b882 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux-Java7-64 - Build # 249 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java7-64/249/ 1 tests failed. REGRESSION: org.apache.lucene.analysis.ngram.NGramTokenizerTest.testRandomStrings Error Message: some thread(s) failed Stack Trace: java.lang.RuntimeException: some thread(s) failed at __randomizedtesting.SeedInfo.seed([50E33DEA43DF254D:D86A3D54E0DB7278]:0) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:463) at org.apache.lucene.analysis.ngram.NGramTokenizerTest.testRandomStrings(NGramTokenizerTest.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 4560 lines...] [junit4] Suite: org.apache.lucene.analysis.ngram.NGramTokenizerTest [junit4] ERROR645s J1 | NGramTokenizerTest.testRandomStrings [junit4] Throwable #1: java.lang.RuntimeException: some thread(s) failed [junit4]at __randomizedtesting.SeedInfo.seed([50E33DEA43DF254D:D86A3D54E0DB7278]:0) [junit4]at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:463) [junit4]at org.apache.lucene.analysis.ngram.NGramTokenizerTest.testRandomStrings(NGramTokenizerTest.java:106) [junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4]at
Re: JIRA assignee options
You should be able to do this if you're logged in, there should be a button along the top titled assign to me. Sometimes I get logged out when I restart my computer or something If you're sure you're logged in and still can't assign it to yourself, let us know Best Erick On Wed, Jun 27, 2012 at 1:19 PM, Despot Jakimovski despot.jakimov...@gmail.com wrote: Hi, I would like to assign myself to the JIRA task, but I cannot find the Assign to button. Probably I am missing some privileges. Can someone help me fix this? Cheers, despot - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: JIRA assignee options
I added you to the 'contributor' role. so you should be able to do this now: though I dont know if you need to logout and log back in, or if it takes place immediately Let me know if you have problems! On Wed, Jun 27, 2012 at 1:19 PM, Despot Jakimovski despot.jakimov...@gmail.com wrote: Hi, I would like to assign myself to the JIRA task, but I cannot find the Assign to button. Probably I am missing some privileges. Can someone help me fix this? Cheers, despot -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 263 - Failure!
RateLimiter + Serial Merge Scheduler too it seems? Toning this thing down seems like a good idea because we also sometimes use ThrottledIndexOutput: i bet if you are unlucky to get both its really really slow. On Mon, Jun 25, 2012 at 8:08 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, I killed that one after 3.5 hrs hanging in Kumoroji tests: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java6-64/263/console - It looks like RateLimiter limited too much What should we do? Tone down the limiter generally or how can we prevent such slowness? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Policeman Jenkins Server [mailto:jenk...@sd-datasolutions.de] Sent: Monday, June 25, 2012 2:05 PM To: dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-4.x-Linux-Java6-64 - Build # 263 - Failure! Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux-Java6- 64/263/ All tests passed Build Log: [...truncated 3928 lines...] [junit4] 2012-06-25 12:05:09 [junit4] Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.7-b02 mixed mode): [junit4] [junit4] Thread-4 prio=10 tid=0x7f6bfc0aa000 nid=0x4749 waiting on condition [0x7f6c08498000] [junit4] java.lang.Thread.State: TIMED_WAITING (sleeping) [junit4] at java.lang.Thread.sleep(Native Method) [junit4] at java.lang.Thread.sleep(Thread.java:302) [junit4] at org.apache.lucene.store.RateLimiter.pause(RateLimiter.java:83) [junit4] at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutp utWrapper.java:82) [junit4] at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:49) [junit4] at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65 ) [junit4] at org.apache.lucene.codecs.BlockTermsWriter$TermsWriter.flushBlock(BlockTer msWriter.java:294) [junit4] at org.apache.lucene.codecs.BlockTermsWriter$TermsWriter.finishTerm(BlockTer msWriter.java:212) [junit4] at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:163) [junit4] at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:65) [junit4] at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:3 24) [junit4] at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110) [junit4] at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3504) [junit4] at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3139) [junit4] at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.j ava:37) [junit4] - locked 0xe0601d98 (a org.apache.lucene.index.SerialMergeScheduler) [junit4] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1703) [junit4] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1697) [junit4] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1344) [junit4] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1084) [junit4] at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit er.java:186) [junit4] at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit er.java:145) [junit4] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(Base TokenStreamTestCase.java:562) [junit4] at org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenSt reamTestCase.java:64) [junit4] at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(Bas eTokenStreamTestCase.java:421) [junit4] [junit4] Thread-3 prio=10 tid=0x7f6bfc0a8800 nid=0x4748 waiting for monitor entry [0x7f6c0859a000] [junit4] java.lang.Thread.State: BLOCKED (on object monitor) [junit4] at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.j ava:34) [junit4] - waiting to lock 0xe0601d98 (a org.apache.lucene.index.SerialMergeScheduler) [junit4] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1703) [junit4] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1697) [junit4] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1344) [junit4] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1084) [junit4] at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit er.java:186) [junit4] at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWrit er.java:145) [junit4] at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(Base TokenStreamTestCase.java:562) [junit4] at org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenSt reamTestCase.java:64) [junit4] at
[jira] [Updated] (LUCENE-4170) TestRandomChains fail with Shingle+CommonGrams
[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-4170: Attachment: recursive.shinglefilter.output.png This image is a (not pretty) word lattice representation of the output from the double ShingleFilter thought problem described above - should help to more easily visualize the graph. (I wish I could make Graphviz line up the dots in a straight line, but couldn't figure out how to do that.) TestRandomChains fail with Shingle+CommonGrams -- Key: LUCENE-4170 URL: https://issues.apache.org/jira/browse/LUCENE-4170 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4170.patch, recursive.shinglefilter.output.png ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt -Dtests.timezone=America/Argentina/Salta -Dargs=-Dfile.encoding=ISO8859-1 This test has two shinglefilters, then a common-grams filter. I think posLen impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Test timing stats.
Hi. Would somebody who has a physical machine running jenkins add the following as a post-run step? ant -f lucene/build.xml test-updatecache -Dtests.cachefile=XXX where XXX is preferably a path to a file somewhere outside of the build area (so that it's not cleaned/ removed between builds)? This will update build times with a history of 20 builds per suite. Once in a while this file should be copied to: lucene/tools/junit4/cached-timehints.txt These are hints for test load balancer if multiple jvms are used (just a remainder -- the order of suites is still randomized within a single jvm, and only a fraction of the suites are initially load-balanced, the rest is delegated to job stealing to level jvm times). I'm currently running a few builds to update the stats, but for the future it'd be a nice side effect of jenkins runs. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402531#comment-13402531 ] David Smiley commented on LUCENE-4167: -- bq. I don't see the need to differentiate BBoxIntersects and Intersects. If the user wants to find those Documents related to the bounding box of a Shape, then they can call shape.getBoundingBox() and pass that into the Strategy. The Strategys shouldn't have to worry about the Shape (although TwoDoubles does but that needs to be re-thought separately). The Strategys should just take the Shape given and roll with it. Is that what you're suggesting? The stategy shouldn't care about the bbox concept, I agree. I think the bbox capability should be decoupled from SpatialOperation. It's not a simple matter of the client calling queryShape.getBoundingBox() since the expression of the query shape from client to server is a string. So instead of BBoxIntersects(Circle(3,5 d=10)) I propose supporting INTERSECTS(BBOX(Circle(3,5 d=10))). The actual set of operations I want to support are [E]CQL spatial predicates: http://docs.geoserver.org/latest/en/user/filter/ecql_reference.html#spatial-predicate but that perhaps deserves its own issue. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Count of keys of an FST
I don't think there is one that you could use out of the box... but maybe I'm wrong and it's stored in the header somewhere (don't have the source in front of me). To calculate it by hand the worst case is that you'll need a recursive traversal, which would mean O(number of stored states) with intermediate count caches or O(number of keys) without any caches and memory overhead (just recursive traversal). Dawid On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The FST class has a number of methods that return counts, which one returns the total number of keys that have been encoded into the FST? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Count of keys of an FST
Sounds like I should just count as the keys are added and store the count separately. On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: I don't think there is one that you could use out of the box... but maybe I'm wrong and it's stored in the header somewhere (don't have the source in front of me). To calculate it by hand the worst case is that you'll need a recursive traversal, which would mean O(number of stored states) with intermediate count caches or O(number of keys) without any caches and memory overhead (just recursive traversal). Dawid On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The FST class has a number of methods that return counts, which one returns the total number of keys that have been encoded into the FST? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Count of keys of an FST
If you need the count with constant time then yes, you should store it separately. You could also make a transducer that would store it at the root node as side-effect of values associated with keys, but it's kind of ugly. Please check the fst header though -- I'm not sure, maybe Mike wrote it so that the node count/ keys count is in there. Dawid On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Sounds like I should just count as the keys are added and store the count separately. On Wed, Jun 27, 2012 at 3:48 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I don't think there is one that you could use out of the box... but maybe I'm wrong and it's stored in the header somewhere (don't have the source in front of me). To calculate it by hand the worst case is that you'll need a recursive traversal, which would mean O(number of stored states) with intermediate count caches or O(number of keys) without any caches and memory overhead (just recursive traversal). Dawid On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The FST class has a number of methods that return counts, which one returns the total number of keys that have been encoded into the FST? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4171) Performance improvements to Packed64
Toke Eskildsen created LUCENE-4171: -- Summary: Performance improvements to Packed64 Key: LUCENE-4171 URL: https://issues.apache.org/jira/browse/LUCENE-4171 Project: Lucene - Java Issue Type: Sub-task Components: core/other Affects Versions: 4.0, 5.0 Environment: Tested with 4 different Intel machines Reporter: Toke Eskildsen Priority: Minor Based on the performance measurements of PackedInts.Mutable's in LUCENE-4062, a new version of Packed64 has been created that is consistently faster than the old Packed64 for both get and set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4171) Performance improvements to Packed64
[ https://issues.apache.org/jira/browse/LUCENE-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toke Eskildsen updated LUCENE-4171: --- Attachment: LUCENE-4171.patch Finished implementation, ready for review potential merge. TestPackedInts passes. Performance improvements to Packed64 Key: LUCENE-4171 URL: https://issues.apache.org/jira/browse/LUCENE-4171 Project: Lucene - Java Issue Type: Sub-task Components: core/other Affects Versions: 4.0, 5.0 Environment: Tested with 4 different Intel machines Reporter: Toke Eskildsen Priority: Minor Labels: performance Attachments: LUCENE-4171.patch Original Estimate: 4h Remaining Estimate: 4h Based on the performance measurements of PackedInts.Mutable's in LUCENE-4062, a new version of Packed64 has been created that is consistently faster than the old Packed64 for both get and set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4080: - Attachment: LUCENE-4080.patch New patch. Only the {{liveDocs/numDeletedDocs}} copy needs to be protected by the {{IndexWriter}} lock. However, the whole method needs to be protected by the ReadersAndLiveDocs lock but we can't nest the former into the latter since other pieces of code do the opposite (potential deadlock). So I replaced the {{ReadersAndLiveDocs}} lock with a {{ReentrantLock}} so that it can overlap with the {{IndexWriter}} lock. Does it look better? SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs -- Key: LUCENE-4080 URL: https://issues.apache.org/jira/browse/LUCENE-4080 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1 Reporter: Adrien Grand Priority: Trivial Fix For: 4.1 Attachments: LUCENE-4080.patch, LUCENE-4080.patch At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs. From LUCENE-2357: bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for // Must sync to ensure BufferedDeletesStream in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount... bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue. bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result (SegmentReader.numDeletedDocs can always be trusted) would be cleaner... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402591#comment-13402591 ] David Smiley commented on LUCENE-4167: -- bq. I can see the need for different behaviour for different Shape relationships to. But I think we should perhaps do that using method specialization. We already have the PrefixTreeStrategy abstraction, so you could write a WithinRecursivePrefixTreeStrategy which specialized makeQuery differently. That way it is clear to the user what the Strategy does, we won't need the runtime checks and we won't have Strategys like TwoDoubles which has methods for each of the different behaviours in the same class. Sorry, but I disagree with your point of view. The Strategy is supposed to be a single facade to the implementation details of how a query will work, including the various possible spatial predicates (i.e. spatial operations) that is supports. If one Java class file shows that it becomes too complicated and it would be better separated because implementing different predicates are just so fundamentally different, then then the operations could be decomposed to separate source files but it would be behind the facade of the Strategy. I don't believe that TwoDoublesStrategy demonstrates complexity of a class trying to do too many things. I absolutely think TwoDoublesStrategy could be coded to be more clear. If it is as buggy/untested as I think it is and nobody wants to fix it (I don't), personally I think this strategy can go away. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4172) clean up redundant throws clauses
Robert Muir created LUCENE-4172: --- Summary: clean up redundant throws clauses Key: LUCENE-4172 URL: https://issues.apache.org/jira/browse/LUCENE-4172 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir examples are things like ctors that list throws XYZException but actually dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' when all of these are actually IOException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4172) clean up redundant throws clauses
[ https://issues.apache.org/jira/browse/LUCENE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4172: Attachment: LUCENE-4172.patch the start to a patch... eclipse doesn't do well here so it would be better to use something else to find these. clean up redundant throws clauses - Key: LUCENE-4172 URL: https://issues.apache.org/jira/browse/LUCENE-4172 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4172.patch examples are things like ctors that list throws XYZException but actually dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' when all of these are actually IOException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4172) clean up redundant throws clauses
[ https://issues.apache.org/jira/browse/LUCENE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402669#comment-13402669 ] Steven Rowe commented on LUCENE-4172: - IntelliJ has two relevant inspections: Redundant throws clause and Duplicate throws. I've applied your patch to trunk and I'm running these on the whole project to see what they find. clean up redundant throws clauses - Key: LUCENE-4172 URL: https://issues.apache.org/jira/browse/LUCENE-4172 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4172.patch examples are things like ctors that list throws XYZException but actually dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' when all of these are actually IOException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4172) clean up redundant throws clauses
[ https://issues.apache.org/jira/browse/LUCENE-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402680#comment-13402680 ] Robert Muir commented on LUCENE-4172: - that sounds nice: I think we always want to fix 'duplicate throws'. But redundant throws requires some decisions... basically i looked at each one and: * nuke the redundant throws if its a static method, private, or package-private, or final * nuke the redundant throws if its a ctor (subclass can always declare its own) * keep the redundant throws if its public/protected non-final method that can be overridden clean up redundant throws clauses - Key: LUCENE-4172 URL: https://issues.apache.org/jira/browse/LUCENE-4172 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4172.patch examples are things like ctors that list throws XYZException but actually dont, and things like 'throws CorruptIndex, LockObtainedFailed, IOException' when all of these are actually IOException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402743#comment-13402743 ] Hoss Man commented on SOLR-1725: bq. My comment above still stands: Another TODO is to get this to work with a scripting language implementation JAR file being added as a plugin somehow. I played around with this on the train today and confirmed that we can do runtime loading of jars that included script engines if we changed the ScriptEngineManager instantiation so that we use the one arg constructor and pass in resourceLoader.getClassLoader(). A few other notes based on reviewing the patch and playing arround with it. Baring objections i'll probably take a stab at addressing these tomorow or friday... * i don't see any mechanism for scripts to indicate that processing should stop -- ie: the way a java UpdateProcessor would just return w/o calling super.foo. we should add/test/doc some functionality to look at the result of the invokeFunction call to support this * the tests seem to assert that the side effects of the scripts happen (ie: that the testcase records the function names) but i don't see any assertions that the expected modifications of the update commands is happening (ie: that documents are being modified in processAdd * we need to test that request params are easily accessable (i'm not sure how well the SolrQueryRequest class works in various scripting langauges, so might need to pull out hte SolrParams and expose directly - either way we need to demonstrate doing it in a test) * whitespace/comma/pipesplitting of the script names is a bad meme. we should stop doing that, and require that multiple scripts be specified as multiple {{str}} params ** we can add convenience code to support {{arr name=scriptstrstr/arr}} style as well * ScriptFile and it's extension parsing is very primitive and broken on any file with . in it's name. We should just use the helper method for parsing filename extensions that already exists in commons-io * from what i can tell looking at the ScriptEngine javadocs, it's possible that a ScriptEngine might exist w/o a specific file extension, or that multiple engines could support the same extension(s) we should offer an init param that lets the user specify a ScriptEngine by shortname to override whatever extension might be found * currently, problems with scripts not being found, or engines for scripts not being found, aren't reported until first request tries to use them - we should error check all of this in init (or inform) and fail fast. ** ditto for the assumption in invokeFunction that we can cast every ScriptEngine to Invocable -- we should at check this on init/inform and fail fast * the way the various UpdateProcessor methods are implemented to be lenient about any scripts that don't explicitly implement a method seems kludgy -- isn't there anyway we can introspect the engine to ask if a function exists? ** in particular, when i did some testing with jruby, i found that it didn't work at all - i guess jruby was throwing a ScriptException instead of NoSuchMethodException? {noformat} undefined method `processCommit' for main:Object (NoMethodError) org.jruby.embed.InvokeFailedException: (NoMethodError) undefined method `processCommit' for main:Object at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.call(EmbedRubyObjectAdapterImpl.java:403) at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.callMethod(EmbedRubyObjectAdapterImpl.java:189) at org.jruby.embed.ScriptingContainer.callMethod(ScriptingContainer.java:1386) at org.jruby.embed.jsr223.JRubyEngine.invokeFunction(JRubyEngine.java:262) at org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(ScriptUpdateProcessorFactory.java:221) at org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.processCommit(ScriptUpdateProcessorFactory.java:202) {noformat} Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Assignee: Erik Hatcher Labels: UpdateProcessor Fix For: 4.1 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request
Make ivy search maven repo1/repo2?
How can I get ivy to include the maven.org repo2 in the resolver list? Is there a reason it is not in the list? I ask because there is an artifact (extjwnl) which is only on repo2. -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Full build target?
I would like to build Lucene and have the new jars be used by Solr. Which top-level target does this? -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402783#comment-13402783 ] Chris Male commented on LUCENE-4167: {quote} The stategy shouldn't care about the bbox concept, I agree. I think the bbox capability should be decoupled from SpatialOperation. It's not a simple matter of the client calling queryShape.getBoundingBox() since the expression of the query shape from client to server is a string. So instead of BBoxIntersects(Circle(3,5 d=10)) I propose supporting INTERSECTS(BBOX(Circle(3,5 d=10))). The actual set of operations I want to support are [E]CQL spatial predicates: http://docs.geoserver.org/latest/en/user/filter/ecql_reference.html#spatial-predicate but that perhaps deserves its own issue. {quote} I think we need to be cautious here about exposing too much complexity in the Strategys. Query language requirements shouldn't be passed on down to Strategy. Instead, the Strategys should have a very controlled list of spatial operations they support and how they are connected to the query parser should be the parser's responsibility. Requiring direct users of the Strategys to use queryShape.getBoundingBox() seems like a good way to mitigate complexity in the Strategys themselves and we can then do whatever we like in any parsers to make our query languages work. {quote} Sorry, but I disagree with your point of view. The Strategy is supposed to be a single facade to the implementation details of how a query will work, including the various possible spatial predicates (i.e. spatial operations) that is supports. If one Java class file shows that it becomes too complicated and it would be better separated because implementing different predicates are just so fundamentally different, then then the operations could be decomposed to separate source files but it would be behind the facade of the Strategy. {quote} Okay fair enough. I think we can come to a compromise. My goal here is to make it clear to the user what operations our Strategys support at compile time, not through some undocumented runtime check. That seems a recipe for disaster. Imagine someone who uses one of the Prefix Strategys and then tries to do a Disjoint operation. At runtime they get an error and then after some reading through source code they discover they actually need to use TwoDoubles which requires a re-index. Instead what I recommend is that we rename makeQuery to makeIntersectsQuery. Then all implementations of that method will only construct a Query for the intersects operation. We can then add makeXXXQuery methods to the Strategy interface as we add support to all the implementations. If a Strategy impl supports a particular operation that the rest don't, then that can just be a method on that specific Strategy and not added to the Strategy interface. Consequently TwoDoubles will get a makeDisjointQuery method. This way we have more readable code, better compile time checking and less confused users. How we map this into any Client / Server interaction or a query language should be the responsibility of those classes, not the Strategys. I'm going to create a patch to this effect. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4165) HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed
[ https://issues.apache.org/jira/browse/LUCENE-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4165: --- Attachment: LUCENE-4156-trunk.patch Updated version of trunk patch which closes the InputStreams created in Solr's HunspellStemFilterFactory. HunspellDictionary - AffixFile Reader closed, Dictionary Readers left unclosed -- Key: LUCENE-4165 URL: https://issues.apache.org/jira/browse/LUCENE-4165 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.6 Environment: Linux, Java 1.6 Reporter: Torsten Krah Priority: Minor Attachments: LUCENE-4156-trunk.patch, lucene_36.patch, lucene_trunk.patch The HunspellDictionary takes an InputStream for affix file and a List of Streams for dictionaries. Javadoc is not clear about i have to close those stream myself or the Dictionary constructor does this already. Looking at the code, at least reader.close() is called when the affix file is read via readAffixFile() method (although closing streams is not done in a finally block - so the constructor may fail to do so). The readDictionaryFile() method does miss the call to close the reader in contrast to readAffixFile(). So the question here is - have i have to close the streams myself after instantiating the dictionary? Or is the close call only missing for the dictionary streams? Either way, please add the close calls in a safe manner or clarify javadoc so i have to do this myself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4166) TwoDoublesStrategy is broken for Circles
[ https://issues.apache.org/jira/browse/LUCENE-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-4166. Resolution: Fixed Fix Version/s: 5.0 4.0 Assignee: Chris Male Fixed, but we really need to look at this Strategy closely in another issue. TwoDoublesStrategy is broken for Circles Key: LUCENE-4166 URL: https://issues.apache.org/jira/browse/LUCENE-4166 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Assignee: Chris Male Priority: Critical Fix For: 4.0, 5.0 Attachments: LUCENE-4166.patch TwoDoublesStrategy supports finding Documents that are within a Circle, yet it is impossible to provide one due to the following code found at the start of TwoDoublesStrategy.makeQuery(): {code} Shape shape = args.getShape(); if (!(shape instanceof Rectangle)) { throw new InvalidShapeException(A rectangle is the only supported shape (so far), not +shape.getClass());//TODO } Rectangle bbox = (Rectangle) shape; {code} I think instead the code which handles Circles should ask for the bounding box of the Shape and uses that instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
Chris Male created LUCENE-4173: -- Summary: Remove IgnoreIncompatibleGeometry for SpatialStrategys Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4173: --- Attachment: LUCENE-4173.patch Simple patch removing the option and improving how non-Point shapes are handled in TwoDoubles. Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4173.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4167: --- Attachment: LUCENE-4167.patch First shot at this. I completely removed SpatialArgs from the Strategy interface. We don't have so many parameters that we can't force them to be defined. Changed makeQuery/makeFilter to makeIntersectsQuery/makeIntersectsFilter respectively. I want to address the method javadocs before committing this. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4167.patch Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled
[ https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402826#comment-13402826 ] Jack Krupansky commented on SOLR-3580: -- My recommendation is to have an additional option, lowercaseNotOperator which defaults to false. This would be the safe choice that Yonik recommends, but allow you to override that decision as you see fit for your application. In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled Key: SOLR-3580 URL: https://issues.apache.org/jira/browse/SOLR-3580 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3580.patch When lowercase operator support is enabled (for edismax), the lowercase 'not' operator is being wrongly treated as a literal term (and not as an operator). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402827#comment-13402827 ] David Smiley commented on LUCENE-4167: -- I agree that something could/should be done to improve the awareness of exactly which operations a Strategy supports. This is of course just one aspect of a Strategy's limitations, consider wether or not the Strategy supports multi-value data or wether it supports indexing non-point shapes. Surely *that* is quite relevant to a potential client. It seems very doubtful to me that the compile-time type checks could be added for everything. And even with spatial operations -- there are a lot of them to support, and wouldn't it be twice as many for both makeXXXQuery makeXXXFilter? I don't know where you would draw the line. At least the current interface is fairly simple, and there is always Javadocs. That said, I look forward to seeing any patches you may having demonstrating what you have in mind. Maybe I just won't get it until I see it. bq. How we map this into any Client / Server interaction or a query language should be the responsibility of those classes, not the Strategies. True. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4167.patch Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4167) Remove the use of SpatialOperation
[ https://issues.apache.org/jira/browse/LUCENE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402830#comment-13402830 ] Chris Male commented on LUCENE-4167: {quote} This is of course just one aspect of a Strategy's limitations, consider wether or not the Strategy supports multi-value data or wether it supports indexing non-point shapes. Surely that is quite relevant to a potential client. It seems very doubtful to me that the compile-time type checks could be added for everything {quote} Quite right and we can tackle these issues on a case by case basis. Having a check like supportsMultiValued() on Strategys seems like a good idea. That way the user can consult this method before indexing. {quote} And even with spatial operations – there are a lot of them to support, and wouldn't it be twice as many for both makeXXXQuery makeXXXFilter? I don't know where you would draw the line. At least the current interface is fairly simple, and there is always Javadocs. {quote} We don't have any useful Javadocs on this issue so I'm not going to rely on that. I don't see any issue with having a makeXXXQuery/Filter for each operation. Strategys are essentially factories so I think the ability to see at compile time what the factory can create is vitally important. If we get to 20 operations I'll start to worry. Remove the use of SpatialOperation -- Key: LUCENE-4167 URL: https://issues.apache.org/jira/browse/LUCENE-4167 Project: Lucene - Java Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4167.patch Looking at the code in TwoDoublesStrategy I noticed SpatialOperations.BBoxWithin vs isWithin which confused me. Looking over the other Strategys I see that really only isWithin and Intersects is supported. Only TwoDoublesStrategy supports IsDisjointTo. The remainder of SpatialOperations are not supported. I don't think we should use SpatialOperation as this stage since it is not clear what Operations are supported by what Strategys, many Operations are not supported, and the code for handling the Operations is usually the same. We can spin off the code for TwoDoublesStrategy's IsDisjointTo support into a different Strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
[ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402851#comment-13402851 ] Trym Møller commented on SOLR-3582: --- Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first stop after a node change occurs. As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect. If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers. Best regards Trym Leader election zookeeper watcher is responding to con/discon notifications incorrectly. Key: SOLR-3582 URL: https://issues.apache.org/jira/browse/SOLR-3582 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes. http://www.lucidimagination.com/search/document/e13ef390b882 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
[ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402851#comment-13402851 ] Trym Møller edited comment on SOLR-3582 at 6/28/12 5:10 AM: Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first stop after it has been notified about a node change. As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect. If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers. Best regards Trym was (Author: trym): Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first stop after a node change occurs. As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect. If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers. Best regards Trym Leader election zookeeper watcher is responding to con/discon notifications incorrectly. Key: SOLR-3582 URL: https://issues.apache.org/jira/browse/SOLR-3582 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.0, 5.0 As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes. http://www.lucidimagination.com/search/document/e13ef390b882 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org