[jira] [Commented] (SOLR-3423) HttpShardHandlerFactory does not shutdown its threadpool
[ https://issues.apache.org/jira/browse/SOLR-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547654#comment-15547654 ] Greg Bowyer commented on SOLR-3423: --- Safe to close > HttpShardHandlerFactory does not shutdown its threadpool > > > Key: SOLR-3423 > URL: https://issues.apache.org/jira/browse/SOLR-3423 > Project: Solr > Issue Type: Bug >Affects Versions: 3.6 > Reporter: Greg Bowyer > Assignee: Greg Bowyer > Labels: distributed, shard > Fix For: 3.6.3 > > Attachments: > SOLR-3423-HttpShardHandlerFactory_ThreadPool_Shutdown_lucene_3x.diff, > SOLR-3423-HttpShardHandlerFactory_ThreadPool_Shutdown_lucene_3x.diff > > > The HttpShardHandlerFactory is not getting a chance to shutdown its > threadpool, this means that in situations like a core reload / core swap its > possible for the handler to leak threads > (This may also be the case if the webapp is loaded / unloaded in the > container) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop
Greg Bowyer created LUCENE-6536: --- Summary: Migrate HDFSDirectory from solr to lucene-hadoop Key: LUCENE-6536 URL: https://issues.apache.org/jira/browse/LUCENE-6536 Project: Lucene - Core Issue Type: Improvement Reporter: Greg Bowyer I am currently working on a search engine that is throughput orientated and works entirely in apache-spark. As part of this, I need a directory implementation that can operate on HDFS directly. This got me thinking, can I take the one that was worked on so hard for solr hadoop. As such I migrated the HDFS and blockcache directories out to a lucene-hadoop module. Having done this work, I am not sure if it is actually a good change, it feels a bit messy, and I dont like how the Metrics class gets extended and abused. Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop
[ https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-6536: Attachment: LUCENE-6536.patch Migrate HDFSDirectory from solr to lucene-hadoop Key: LUCENE-6536 URL: https://issues.apache.org/jira/browse/LUCENE-6536 Project: Lucene - Core Issue Type: Improvement Reporter: Greg Bowyer Labels: hadoop, hdfs, lucene, solr Attachments: LUCENE-6536.patch I am currently working on a search engine that is throughput orientated and works entirely in apache-spark. As part of this, I need a directory implementation that can operate on HDFS directly. This got me thinking, can I take the one that was worked on so hard for solr hadoop. As such I migrated the HDFS and blockcache directories out to a lucene-hadoop module. Having done this work, I am not sure if it is actually a good change, it feels a bit messy, and I dont like how the Metrics class gets extended and abused. Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop
[ https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579210#comment-14579210 ] Greg Bowyer commented on LUCENE-6536: - bq. Questions: bq. What will be done to deal with the bugginess of this thing? I see many reports of user corruption issues. By committing it, we take responsibility for this and it becomes our problem. I don't want to see the code committed to lucene just for this reason. Fix its bugs ;), joking aside is it the directory or the blockcache that is the source of most of the corruptions bq. What will be done about the performance? I am not really sure the entire technique is viable. My usecase is a bit odd, I have many small (2*HDFS block) indexes that get run over map jobs in hadoop. The performance I got last time I did this (with a dirty hack Directory that copied the files in and out of HDFS :S) was pretty good. Its a throughput orientated usage, I think if you tried to use this to back an online searcher you would have poor performance. bq. Personally, I think if someone wants to do this, a better integration point is to make it a java 7 filesystem provider. That is really how such a filesystem should work anyway. That is awesome I didnt know such an SPI existed in java. I have found a few people that are trying to make a provider for hadoop. I also dont have the greatest love for this path, the more test manipulations I did the less and less it felt like a simple feature that should be in lucene. I might try to either strip out the block-cache from this patch, or use a HDFS filesystem SPI in java7. Migrate HDFSDirectory from solr to lucene-hadoop Key: LUCENE-6536 URL: https://issues.apache.org/jira/browse/LUCENE-6536 Project: Lucene - Core Issue Type: Improvement Reporter: Greg Bowyer Labels: hadoop, hdfs, lucene, solr Attachments: LUCENE-6536.patch I am currently working on a search engine that is throughput orientated and works entirely in apache-spark. As part of this, I need a directory implementation that can operate on HDFS directly. This got me thinking, can I take the one that was worked on so hard for solr hadoop. As such I migrated the HDFS and blockcache directories out to a lucene-hadoop module. Having done this work, I am not sure if it is actually a good change, it feels a bit messy, and I dont like how the Metrics class gets extended and abused. Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop
[ https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579388#comment-14579388 ] Greg Bowyer commented on LUCENE-6536: - bq. That leads you to a test impl here: https://github.com/damiencarol/jsr203-hadoop These are the people that I am talking about. Migrate HDFSDirectory from solr to lucene-hadoop Key: LUCENE-6536 URL: https://issues.apache.org/jira/browse/LUCENE-6536 Project: Lucene - Core Issue Type: Improvement Reporter: Greg Bowyer Labels: hadoop, hdfs, lucene, solr Attachments: LUCENE-6536.patch I am currently working on a search engine that is throughput orientated and works entirely in apache-spark. As part of this, I need a directory implementation that can operate on HDFS directly. This got me thinking, can I take the one that was worked on so hard for solr hadoop. As such I migrated the HDFS and blockcache directories out to a lucene-hadoop module. Having done this work, I am not sure if it is actually a good change, it feels a bit messy, and I dont like how the Metrics class gets extended and abused. Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop
[ https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579400#comment-14579400 ] Greg Bowyer commented on LUCENE-6536: - Oh wow the blur store might be exactly what I am looking for. Migrate HDFSDirectory from solr to lucene-hadoop Key: LUCENE-6536 URL: https://issues.apache.org/jira/browse/LUCENE-6536 Project: Lucene - Core Issue Type: Improvement Reporter: Greg Bowyer Labels: hadoop, hdfs, lucene, solr Attachments: LUCENE-6536.patch I am currently working on a search engine that is throughput orientated and works entirely in apache-spark. As part of this, I need a directory implementation that can operate on HDFS directly. This got me thinking, can I take the one that was worked on so hard for solr hadoop. As such I migrated the HDFS and blockcache directories out to a lucene-hadoop module. Having done this work, I am not sure if it is actually a good change, it feels a bit messy, and I dont like how the Metrics class gets extended and abused. Thoughts anyone -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947339#comment-13947339 ] Greg Bowyer commented on LUCENE-3178: - Shall we call these experiments done, I think it concludes that where we do get wins they are minor and the baggage that comes with is a bit too much Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2014 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3917) Port pruning module to trunk apis
[ https://issues.apache.org/jira/browse/LUCENE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer reassigned LUCENE-3917: --- Assignee: Greg Bowyer Port pruning module to trunk apis - Key: LUCENE-3917 URL: https://issues.apache.org/jira/browse/LUCENE-3917 Project: Lucene - Core Issue Type: Task Components: modules/other Affects Versions: 4.0-ALPHA Reporter: Robert Muir Assignee: Greg Bowyer Fix For: 4.8 Attachments: LUCENE-3917-Initial-port-of-index-pruning.patch Pruning module was added in LUCENE-1812, but we need to port this to trunk (4.0) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945878#comment-13945878 ] Greg Bowyer commented on LUCENE-3178: - Interesting, its also somewhat interesting to see that AndHighLow gets a big jump in performance, any ideas on why that might be Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2014 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564880#comment-13564880 ] Greg Bowyer edited comment on LUCENE-3178 at 3/7/14 6:48 AM: - So I was going to shut this down today, and just to make sure I ran the benchmark on the simplest code possible ... and suddenly I got good results, this is idiopathic :S {code} Report after iter 19: TaskQPS baseline StdDevQPS mmap_tests StdDev Pct diff OrHighHigh1.68 (11.2%)1.73 (10.3%) 3.0% ( -16% - 27%) PKLookup 129.89 (5.8%) 135.03 (6.0%) 4.0% ( -7% - 16%) HighTerm8.09 (13.6%)8.43 (12.8%) 4.2% ( -19% - 35%) OrHighMed4.46 (10.4%)4.67 (9.5%) 4.7% ( -13% - 27%) OrHighLow4.82 (10.6%)5.09 (10.3%) 5.6% ( -13% - 29%) HighSpanNear0.92 (8.1%)0.97 (7.3%) 5.9% ( -8% - 23%) IntNRQ2.51 (10.2%)2.67 (9.9%) 6.6% ( -12% - 29%) HighPhrase0.30 (11.7%)0.32 (12.8%) 6.7% ( -16% - 35%) MedPhrase2.93 (6.8%)3.12 (8.2%) 6.7% ( -7% - 23%) AndHighHigh5.46 (6.6%)5.86 (7.0%) 7.2% ( -5% - 22%) Respell 19.68 (5.9%) 21.15 (6.6%) 7.5% ( -4% - 21%) LowPhrase0.46 (9.5%)0.50 (10.2%) 7.6% ( -11% - 30%) Prefix35.25 (8.2%)5.66 (7.7%) 7.9% ( -7% - 25%) HighSloppyPhrase1.54 (8.0%)1.67 (13.1%) 8.5% ( -11% - 32%) MedSpanNear5.25 (7.0%)5.72 (8.2%) 9.0% ( -5% - 25%) Wildcard 12.44 (5.7%) 13.59 (6.5%) 9.2% ( -2% - 22%) MedSloppyPhrase2.27 (7.2%)2.49 (8.5%) 9.5% ( -5% - 27%) MedTerm 28.16 (10.3%) 30.89 (9.9%) 9.7% ( -9% - 33%) Fuzzy1 18.91 (6.0%) 20.82 (6.7%) 10.1% ( -2% - 24%) Fuzzy2 19.69 (6.6%) 21.68 (7.5%) 10.1% ( -3% - 25%) AndHighMed7.79 (7.5%)8.58 (6.1%) 10.1% ( -3% - 25%) LowSpanNear1.45 (5.7%)1.60 (9.3%) 10.5% ( -4% - 27%) LowSloppyPhrase 22.84 (7.7%) 25.45 (9.7%) 11.4% ( -5% - 31%) LowTerm 46.46 (6.8%) 52.90 (7.6%) 13.9% ( 0% - 30%) AndHighLow 35.92 (5.3%) 42.38 (7.1%) 18.0% ( 5% - 32%) {code} was (Author: gbow...@fastmail.co.uk): So I was going to shut this down today, and just to make sure I ran the benchmark on the simplest code possible ... and suddenly I got good results, this is idiopathic :S https://gist.github.com/0f017853861d050c0b66 {code} Report after iter 19: TaskQPS baseline StdDevQPS mmap_tests StdDev Pct diff OrHighHigh1.68 (11.2%)1.73 (10.3%) 3.0% ( -16% - 27%) PKLookup 129.89 (5.8%) 135.03 (6.0%) 4.0% ( -7% - 16%) HighTerm8.09 (13.6%)8.43 (12.8%) 4.2% ( -19% - 35%) OrHighMed4.46 (10.4%)4.67 (9.5%) 4.7% ( -13% - 27%) OrHighLow4.82 (10.6%)5.09 (10.3%) 5.6% ( -13% - 29%) HighSpanNear0.92 (8.1%)0.97 (7.3%) 5.9% ( -8% - 23%) IntNRQ2.51 (10.2%)2.67 (9.9%) 6.6% ( -12% - 29%) HighPhrase0.30 (11.7%)0.32 (12.8%) 6.7% ( -16% - 35%) MedPhrase2.93 (6.8%)3.12 (8.2%) 6.7% ( -7% - 23%) AndHighHigh5.46 (6.6%)5.86 (7.0%) 7.2% ( -5% - 22%) Respell 19.68 (5.9%) 21.15 (6.6%) 7.5% ( -4% - 21%) LowPhrase0.46 (9.5%)0.50 (10.2%) 7.6% ( -11% - 30%) Prefix35.25 (8.2%)5.66 (7.7%) 7.9% ( -7% - 25%) HighSloppyPhrase1.54 (8.0%)1.67 (13.1%) 8.5% ( -11% - 32%) MedSpanNear5.25 (7.0%)5.72 (8.2%) 9.0% ( -5% - 25%) Wildcard 12.44 (5.7%) 13.59 (6.5%) 9.2
[jira] [Closed] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer closed LUCENE-4332. --- Resolution: Fixed This was done a while back, I should setup something to actually publish the results out Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697035#comment-13697035 ] Greg Bowyer commented on SOLR-4465: --- {quote} On task number two I'm wondering if we could use the standard post filter approach to inject new collectors. Then all we would need is a search component that handles the merge from the shards. This approach could be done with plugins so we wouldn't have to alter the core. The main work then would be a search component that would allow for pluggable merging algorithms. This could be useful in many contexts. We'd need to see how this component would fit in the distributed flow. {quote} Sounds resonable, although I am not quite sure what you mean by plugins in this context Configurable Collectors --- Key: SOLR-4465 URL: https://issues.apache.org/jira/browse/SOLR-4465 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.1 Reporter: Joel Bernstein Fix For: 4.4 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch This ticket provides a patch to add pluggable collectors to Solr. This patch was generated and tested with Solr 4.1. This is how the patch functions: Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory element. For example: collectorFactory name=default class=solr.CollectorFactory/ collectorFactory name=sum class=solr.SumCollectorFactory/ The elements above define two collector factories. The first one is the default collectorFactory. The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements logic that returns the default TopScoreDocCollector and TopFieldCollector. To create your own collectorFactory you must subclass the default CollectorFactory and at a minimum override the getCollector method to return your new collector. The parameter cl turns on pluggable collectors: cl=true If cl is not in the parameters, Solr will automatically use the default collectorFactory. *Pluggable Doclist Sorting With the Docs Collector* You can specify two types of pluggable collectors. The first type is the docs collector. For example: cl.docs=name The above param points to a named collectorFactory in the solrconfig.xml to construct the collector. The docs collectorFactorys must return a collector that extends the TopDocsCollector base class. Docs collectors are responsible for collecting the doclist. You can specify only one docs collector per query. You can pass parameters to the docs collector using local params syntax. For example: cl.docs=\{! sort=mycustomesort\}mycollector If cl=true and a docs collector is not specified, Solr will use the default collectorFactory to create the docs collector. *Pluggable Custom Analytics With Delegating Collectors* You can also specify any number of custom analytic collectors with the cl.analytic parameter. Analytic collectors are designed to collect something else besides the doclist. Typically this would be some type of custom analytic. For example: cl.analytic=sum The parameter above specifies a analytic collector named sum. Like the docs collectors, sum points to a named collectorFactory in the solrconfig.xml. You can specificy any number of analytic collectors by adding additional cl.analytic parameters. Analytic collector factories must return Collector instances that extend DelegatingCollector. A sample analytic collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory. This collectorFactory provides a very simple DelegatingCollector that groups by a field and sums a column of floats. The sum collector is not designed to be a fully functional sum function but to be a proof of concept for pluggable analytics through delegating collectors. You can send parameters to analytic collectors with solr local param syntax. For example: cl.analytic=\{! id=1 groupby=field1 column=field2\}sum The id parameter is mandatory for analytic collectors and is used to identify the output from the collector. In this example the groupby and column params tell the sum collector which field to group by and sum. Analytic collectors are passed a reference to the ResponseBuilder and can place maps with analytic output directory into the SolrQueryResponse with the add() method. Maps that are placed in the SolrQueryResponse are automatically added to the outgoing response. The response will include a list named
[jira] [Created] (SOLR-4961) Add ability to provide custom ranking configurations
Greg Bowyer created SOLR-4961: - Summary: Add ability to provide custom ranking configurations Key: SOLR-4961 URL: https://issues.apache.org/jira/browse/SOLR-4961 Project: Solr Issue Type: New Feature Reporter: Greg Bowyer This is a split from SOLR-4465, wherein the ability was added to make collectors configurable from within solr. The aim is to hide the details of the custom collector work behind an API surface that is high level to the point of allowing end users to not see lucene specific details on a per request basis, but still providing the flexibility to configure things like collectors through configurations defined in the solrconfig.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4961) Add ability to provide custom ranking configurations
[ https://issues.apache.org/jira/browse/SOLR-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-4961: -- Issue Type: Sub-task (was: New Feature) Parent: SOLR-4465 Add ability to provide custom ranking configurations Key: SOLR-4961 URL: https://issues.apache.org/jira/browse/SOLR-4961 Project: Solr Issue Type: Sub-task Reporter: Greg Bowyer This is a split from SOLR-4465, wherein the ability was added to make collectors configurable from within solr. The aim is to hide the details of the custom collector work behind an API surface that is high level to the point of allowing end users to not see lucene specific details on a per request basis, but still providing the flexibility to configure things like collectors through configurations defined in the solrconfig.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4962) Allow for analytic functions to be performed through altered collectors
Greg Bowyer created SOLR-4962: - Summary: Allow for analytic functions to be performed through altered collectors Key: SOLR-4962 URL: https://issues.apache.org/jira/browse/SOLR-4962 Project: Solr Issue Type: Sub-task Reporter: Greg Bowyer This is a split from SOLR-4465, in that issue the ability to create customised collectors that allow for aggregate functions was born, but suffers from being unable to work well with queryResultCaching and grouping. Migrating out this functionality into a collector component within solr, and perhaps pushing down some of the logic towards lucene seems to be the way to go. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693251#comment-13693251 ] Greg Bowyer commented on SOLR-4465: --- Life keeps getting in the way, I have crafted two sub-tasks on this, I would be interested in working through this with you. Configurable Collectors --- Key: SOLR-4465 URL: https://issues.apache.org/jira/browse/SOLR-4465 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.1 Reporter: Joel Bernstein Fix For: 4.4 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch This ticket provides a patch to add pluggable collectors to Solr. This patch was generated and tested with Solr 4.1. This is how the patch functions: Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory element. For example: collectorFactory name=default class=solr.CollectorFactory/ collectorFactory name=sum class=solr.SumCollectorFactory/ The elements above define two collector factories. The first one is the default collectorFactory. The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements logic that returns the default TopScoreDocCollector and TopFieldCollector. To create your own collectorFactory you must subclass the default CollectorFactory and at a minimum override the getCollector method to return your new collector. The parameter cl turns on pluggable collectors: cl=true If cl is not in the parameters, Solr will automatically use the default collectorFactory. *Pluggable Doclist Sorting With the Docs Collector* You can specify two types of pluggable collectors. The first type is the docs collector. For example: cl.docs=name The above param points to a named collectorFactory in the solrconfig.xml to construct the collector. The docs collectorFactorys must return a collector that extends the TopDocsCollector base class. Docs collectors are responsible for collecting the doclist. You can specify only one docs collector per query. You can pass parameters to the docs collector using local params syntax. For example: cl.docs=\{! sort=mycustomesort\}mycollector If cl=true and a docs collector is not specified, Solr will use the default collectorFactory to create the docs collector. *Pluggable Custom Analytics With Delegating Collectors* You can also specify any number of custom analytic collectors with the cl.analytic parameter. Analytic collectors are designed to collect something else besides the doclist. Typically this would be some type of custom analytic. For example: cl.analytic=sum The parameter above specifies a analytic collector named sum. Like the docs collectors, sum points to a named collectorFactory in the solrconfig.xml. You can specificy any number of analytic collectors by adding additional cl.analytic parameters. Analytic collector factories must return Collector instances that extend DelegatingCollector. A sample analytic collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory. This collectorFactory provides a very simple DelegatingCollector that groups by a field and sums a column of floats. The sum collector is not designed to be a fully functional sum function but to be a proof of concept for pluggable analytics through delegating collectors. You can send parameters to analytic collectors with solr local param syntax. For example: cl.analytic=\{! id=1 groupby=field1 column=field2\}sum The id parameter is mandatory for analytic collectors and is used to identify the output from the collector. In this example the groupby and column params tell the sum collector which field to group by and sum. Analytic collectors are passed a reference to the ResponseBuilder and can place maps with analytic output directory into the SolrQueryResponse with the add() method. Maps that are placed in the SolrQueryResponse are automatically added to the outgoing response. The response will include a list named cl.analytic.id, where id is specified in the local param. *Distributed Search* The CollectorFactory also has a method called merge(). This method aggregates the results from each of the shards during distributed search. The default CollectoryFactory implements the default merge logic for merging documents from each shard. If you define a different docs collector you can override the default merge method to merge documents in accordance with how
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665319#comment-13665319 ] Greg Bowyer commented on SOLR-4465: --- I agree with both of Yonik's points, I would be tempted to split this patch into two, and cover the first part here, with analytic / aggregation functions in a follow up patch. Do you want any help with this patch? cutting it up, testing etc ? Configurable Collectors --- Key: SOLR-4465 URL: https://issues.apache.org/jira/browse/SOLR-4465 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.1 Reporter: Joel Bernstein Fix For: 4.4 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch This ticket provides a patch to add pluggable collectors to Solr. This patch was generated and tested with Solr 4.1. This is how the patch functions: Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory element. For example: collectorFactory name=default class=solr.CollectorFactory/ collectorFactory name=sum class=solr.SumCollectorFactory/ The elements above define two collector factories. The first one is the default collectorFactory. The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements logic that returns the default TopScoreDocCollector and TopFieldCollector. To create your own collectorFactory you must subclass the default CollectorFactory and at a minimum override the getCollector method to return your new collector. The parameter cl turns on pluggable collectors: cl=true If cl is not in the parameters, Solr will automatically use the default collectorFactory. *Pluggable Doclist Sorting With the Docs Collector* You can specify two types of pluggable collectors. The first type is the docs collector. For example: cl.docs=name The above param points to a named collectorFactory in the solrconfig.xml to construct the collector. The docs collectorFactorys must return a collector that extends the TopDocsCollector base class. Docs collectors are responsible for collecting the doclist. You can specify only one docs collector per query. You can pass parameters to the docs collector using local params syntax. For example: cl.docs=\{! sort=mycustomesort\}mycollector If cl=true and a docs collector is not specified, Solr will use the default collectorFactory to create the docs collector. *Pluggable Custom Analytics With Delegating Collectors* You can also specify any number of custom analytic collectors with the cl.analytic parameter. Analytic collectors are designed to collect something else besides the doclist. Typically this would be some type of custom analytic. For example: cl.analytic=sum The parameter above specifies a analytic collector named sum. Like the docs collectors, sum points to a named collectorFactory in the solrconfig.xml. You can specificy any number of analytic collectors by adding additional cl.analytic parameters. Analytic collector factories must return Collector instances that extend DelegatingCollector. A sample analytic collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory. This collectorFactory provides a very simple DelegatingCollector that groups by a field and sums a column of floats. The sum collector is not designed to be a fully functional sum function but to be a proof of concept for pluggable analytics through delegating collectors. You can send parameters to analytic collectors with solr local param syntax. For example: cl.analytic=\{! id=1 groupby=field1 column=field2\}sum The id parameter is mandatory for analytic collectors and is used to identify the output from the collector. In this example the groupby and column params tell the sum collector which field to group by and sum. Analytic collectors are passed a reference to the ResponseBuilder and can place maps with analytic output directory into the SolrQueryResponse with the add() method. Maps that are placed in the SolrQueryResponse are automatically added to the outgoing response. The response will include a list named cl.analytic.id, where id is specified in the local param. *Distributed Search* The CollectorFactory also has a method called merge(). This method aggregates the results from each of the shards during distributed search. The default CollectoryFactory implements the default merge logic for merging documents from each shard. If you define
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664722#comment-13664722 ] Greg Bowyer commented on SOLR-4465: --- Is there anything remaining for this ? Configurable Collectors --- Key: SOLR-4465 URL: https://issues.apache.org/jira/browse/SOLR-4465 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.1 Reporter: Joel Bernstein Fix For: 4.4 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch This ticket provides a patch to add pluggable collectors to Solr. This patch was generated and tested with Solr 4.1. This is how the patch functions: Collectors are plugged into Solr in the solconfig.xml using the new collectorFactory element. For example: collectorFactory name=default class=solr.CollectorFactory/ collectorFactory name=sum class=solr.SumCollectorFactory/ The elements above define two collector factories. The first one is the default collectorFactory. The class attribute points to org.apache.solr.handler.component.CollectorFactory, which implements logic that returns the default TopScoreDocCollector and TopFieldCollector. To create your own collectorFactory you must subclass the default CollectorFactory and at a minimum override the getCollector method to return your new collector. The parameter cl turns on pluggable collectors: cl=true If cl is not in the parameters, Solr will automatically use the default collectorFactory. *Pluggable Doclist Sorting With the Docs Collector* You can specify two types of pluggable collectors. The first type is the docs collector. For example: cl.docs=name The above param points to a named collectorFactory in the solrconfig.xml to construct the collector. The docs collectorFactorys must return a collector that extends the TopDocsCollector base class. Docs collectors are responsible for collecting the doclist. You can specify only one docs collector per query. You can pass parameters to the docs collector using local params syntax. For example: cl.docs=\{! sort=mycustomesort\}mycollector If cl=true and a docs collector is not specified, Solr will use the default collectorFactory to create the docs collector. *Pluggable Custom Analytics With Delegating Collectors* You can also specify any number of custom analytic collectors with the cl.analytic parameter. Analytic collectors are designed to collect something else besides the doclist. Typically this would be some type of custom analytic. For example: cl.analytic=sum The parameter above specifies a analytic collector named sum. Like the docs collectors, sum points to a named collectorFactory in the solrconfig.xml. You can specificy any number of analytic collectors by adding additional cl.analytic parameters. Analytic collector factories must return Collector instances that extend DelegatingCollector. A sample analytic collector is provided in the patch through the org.apache.solr.handler.component.SumCollectorFactory. This collectorFactory provides a very simple DelegatingCollector that groups by a field and sums a column of floats. The sum collector is not designed to be a fully functional sum function but to be a proof of concept for pluggable analytics through delegating collectors. You can send parameters to analytic collectors with solr local param syntax. For example: cl.analytic=\{! id=1 groupby=field1 column=field2\}sum The id parameter is mandatory for analytic collectors and is used to identify the output from the collector. In this example the groupby and column params tell the sum collector which field to group by and sum. Analytic collectors are passed a reference to the ResponseBuilder and can place maps with analytic output directory into the SolrQueryResponse with the add() method. Maps that are placed in the SolrQueryResponse are automatically added to the outgoing response. The response will include a list named cl.analytic.id, where id is specified in the local param. *Distributed Search* The CollectorFactory also has a method called merge(). This method aggregates the results from each of the shards during distributed search. The default CollectoryFactory implements the default merge logic for merging documents from each shard. If you define a different docs collector you can override the default merge method to merge documents in accordance with how they are collected at the shard level. With analytic collectors, you'll need to override
[jira] [Closed] (LUCENE-5010) Getting a Poterstemmer error in Solr
[ https://issues.apache.org/jira/browse/LUCENE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer closed LUCENE-5010. --- Resolution: Won't Fix You are right it is the same bug that was reported in the past, and your specific JVM version is one previous to Java 7 update 4, which is the version where all of the jit bugs were fixed I would use 1.7.0u4 or later. A 1.7.0u4 should give you a version string as follows java version 1.7.0_04 Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) Getting a Poterstemmer error in Solr Key: LUCENE-5010 URL: https://issues.apache.org/jira/browse/LUCENE-5010 Project: Lucene - Core Issue Type: Bug Components: modules/analysis, modules/spellchecker Affects Versions: 3.6.1 Environment: Windows 7 64bit Reporter: Mark Streitman Java version 1.7.0 Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) is dying from an error in porterstemmer. This is just like the error listed from 2011 https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070153#comment-13070153 Below is the log file that is generated # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x02b08ce1, pid=3208, tid=4688 # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.analysis.PorterStemmer.stem(I)Z # # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x0f57f000): JavaThread http-localhost-127.0.0.1-8080-5 daemon [_thread_in_Java, id=4688, stack(0x0b25,0x0b35)] siginfo: ExceptionCode=0xc005, reading address 0x0002fd8be046 Registers: RAX=0x0001, RBX=0x, RCX=0xfd8be038, RDX=0x0065 RSP=0x0b34e950, RBP=0x, RSI=0x, RDI=0x0003 R8 =0xfd8be010, R9 =0xfffe, R10=0x0065, R11=0x0032 R12=0x, R13=0x02b08c88, R14=0x0003, R15=0x0f57f000 RIP=0x02b08ce1, EFLAGS=0x00010286 Top of Stack: (sp=0x0b34e950) 0x0b34e950: 00650072 0x0b34e960: fd8be010 0001e053f560 0x0b34e970: fd8bdbb0 0x0b34e980: 0b34ea08 026660d8 0x0b34e990: fd8bdfe0 026660d8 0x0b34e9a0: 0b34ea08 026663d0 0x0b34e9b0: 026663d0 0x0b34e9c0: fd8be010 0b34e9c8 0x0b34e9d0: d2e598d2 0b34ea30 0x0b34e9e0: d2e5a290 d3fe7bc8 0x0b34e9f0: d2e59918 0b34e9b8 0x0b34ea00: 0b34ea40 fd8be010 0x0b34ea10: 02a502c4 0004 0x0b34ea20: fd8bdbb0 0x0b34ea30: fd8be010 02a502c4 0x0b34ea40: f5d70090 fd8bdf30 Instructions: (pc=0x02b08ce1) 0x02b08cc1: 41 83 c1 fb 4c 63 f7 42 0f b7 5c 71 10 89 5c 24 0x02b08cd1: 04 8b c7 83 c0 fe 0f b7 54 41 10 8b df 83 c3 fc 0x02b08ce1: 40 0f b7 6c 59 10 83 c7 fd 44 0f b7 6c 79 10 41 0x02b08cf1: 83 fa 69 0f 84 07 03 00 00 49 b8 d8 31 4e e0 00 Register to memory mapping: RAX=0x0001 is an unknown value RBX=0x is an unallocated location in the heap RCX=0xfd8be038 is an oop [C - klass: {type array char} - length: 50 RDX=0x0065 is an unknown value RSP=0x0b34e950 is pointing into the stack for thread: 0x0f57f000 RBP=0x is an unknown value RSI=0x is an unknown value RDI=0x0003 is an unknown value R8 =0xfd8be010 is an oop org.apache.lucene.analysis.PorterStemmer - klass: 'org/apache/lucene/analysis/PorterStemmer' R9 =0xfffe is an unallocated location in the heap R10=0x0065 is an unknown value R11=0x0032 is an unknown value R12=0x is an unknown value R13=0x02b07c10 [CodeBlob (0x02b07c10)] Framesize: 12 R14=0x0003
[jira] [Updated] (SOLR-4833) All(most all) Logger instances should be made static
[ https://issues.apache.org/jira/browse/SOLR-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-4833: -- Attachment: SOLR-4833-Remove-None-static-loggers.patch I took a first stab at this, two things strike me. * Should lucene use SLF4J as a logging framework instead of j.u.logging ? * Should loggers that are defined on parent classes be removed, I feel that *if* we want subclasses to go through the same logger, they can all access that logger by its moniker themselves, the only thing that stopped me in my tracks is that some of the exposed loggers form part of the user extensible API surface. All(most all) Logger instances should be made static Key: SOLR-4833 URL: https://issues.apache.org/jira/browse/SOLR-4833 Project: Solr Issue Type: Improvement Reporter: Hoss Man Attachments: SOLR-4833-Remove-None-static-loggers.patch The majority of Logger usage in Solr is via static variables, but there are a few places where this pattern does not hold true - i think we should fix that and be completley consistent. if there is any specific cases where a non-static variable really makes a lot of sense, then it should be heavily commented as to why. The SLF4J FAQ has a list of pros and cons for why Logger variables should/shouldn't be static... http://slf4j.org/faq.html#declared_static ...the majority of the pros for non-static usage don't really apply to Solr, while the pros for static usage due. Another lucene/solr specific pro in favor of static variables for loggers is the way our test framework looks for memory leaks in tests. Having a simple test that does not null out a static reference to what seems like a small object is typically fine -- but if that small object has an explicit (non-static) reference to a Logger, all of the state in that Logger is counted as part of the size of that small object leading to confusion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4819) Pimp QueryEqualityTest to use random testing
Greg Bowyer created SOLR-4819: - Summary: Pimp QueryEqualityTest to use random testing Key: SOLR-4819 URL: https://issues.apache.org/jira/browse/SOLR-4819 Project: Solr Issue Type: Improvement Reporter: Greg Bowyer Priority: Minor The current QueryEqualityTest does some (important but) basic tests of query parsing to ensure that queries that are produced are equivalent to each other. Since we do random testing, it might be a good idea to generate random queries rather than pre-canned ones -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4819) Pimp QueryEqualityTest to use random testing
[ https://issues.apache.org/jira/browse/SOLR-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656481#comment-13656481 ] Greg Bowyer commented on SOLR-4819: --- bq. I think having specific test classes for specific types of queries (or for specific qparsers) would be the best place for more randomized testing ... this class is really just the front line defense against something really terrible. That makes sense, I guess I didn't quite understand the purpose of this class Pimp QueryEqualityTest to use random testing Key: SOLR-4819 URL: https://issues.apache.org/jira/browse/SOLR-4819 Project: Solr Issue Type: Improvement Reporter: Greg Bowyer Priority: Minor The current QueryEqualityTest does some (important but) basic tests of query parsing to ensure that queries that are produced are equivalent to each other. Since we do random testing, it might be a good idea to generate random queries rather than pre-canned ones -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch Version that has a basic (but hopefully working) cache implementation. PostFilters are still a bit of an unknown, since these are needed for spacial I will look at how they can be supported Make solr use lucene filters directly - Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yield better cache usage I think as we can reuse docsets across multiple queries, as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Fix Version/s: 5.0 Make solr use lucene filters directly - Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Fix For: 5.0 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yield better cache usage I think as we can reuse docsets across multiple queries, as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4785) New MaxScoreQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-4785: -- Attachment: SOLR-4785-Add-tests-for-maxscore-to-QueryEqualityTest.patch This bit me while I was updating my filter patch (SOLR-3763) I had a stab at putting some basic equality tests in place, but looking at the test case itself I wonder if QueryEqualityTest should be re-worked with the full fury of randomised testing, as it seems to be at best, only testing the happy cases. New MaxScoreQParserPlugin - Key: SOLR-4785 URL: https://issues.apache.org/jira/browse/SOLR-4785 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4785-Add-tests-for-maxscore-to-QueryEqualityTest.patch, SOLR-4785.patch, SOLR-4785.patch A customer wants to contribute back this component. It is a QParser which behaves exactly like lucene parser (extends it), but returns the Max score from the clauses, i.e. max(c1,c2,c3..) instead of the default which is sum(c1,c2,c3...). It does this by wrapping all SHOULD clauses in a DisjunctionMaxQuery with tie=1.0. Any MUST or PROHIBITED clauses are passed through as-is. Non-boolean queries, e.g. NumericRange falls-through to lucene parser. To use, add to solrconfig.xml: {code:xml} queryParser name=maxscore class=solr.MaxScoreQParserPlugin/ {code} Then use it in a query {noformat} q=A AND B AND {!maxscore v=$max}max=C OR (D AND E) {noformat} This will return the score of A+B+max(C,sum(D+E)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch Trunk moves really quickly these days (or I move slowly) Updated patch to cope with recent trunk changes Make solr use lucene filters directly - Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Fix For: 5.0 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yield better cache usage I think as we can reuse docsets across multiple queries, as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1560) maxDocBytesToAnalyze should be required arg up front
[ https://issues.apache.org/jira/browse/LUCENE-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-1560: Labels: dead (was: ) maxDocBytesToAnalyze should be required arg up front Key: LUCENE-1560 URL: https://issues.apache.org/jira/browse/LUCENE-1560 Project: Lucene - Core Issue Type: Improvement Components: modules/highlighter Affects Versions: 2.4.1 Reporter: Michael McCandless Labels: dead Fix For: 4.4 We recently changed IndexWriter to require you to specify up-front MaxFieldLength, on creation, so that you are aware of this dangerous loses stuff setting. Too many developers had fallen into the trap of how come my search can't find this document I think we should do the same with maxDocBytesToAnalyze with highlighter? Spinoff from this thread: http://www.nabble.com/Lucene-Highlighting-and-Dynamic-Summaries-p22385887.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-1743: Labels: dead (was: ) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS - Key: LUCENE-1743 URL: https://issues.apache.org/jira/browse/LUCENE-1743 Project: Lucene - Core Issue Type: Improvement Components: core/store Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Uwe Schindler Labels: dead Fix For: 4.4 This is a followup to LUCENE-1741: Javadocs state (in FileChannel#map): For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. MMapDirectory should get a user-configureable size parameter that is a lower limit for mmapping files. All files with a sizelimit should be opened using a conventional IndexInput from SimpleFS or NIO (another configuration option for the fallback?). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2713) TestPhraseQuery.testRandomPhrases takes minutes to run with SimpleText
[ https://issues.apache.org/jira/browse/LUCENE-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer resolved LUCENE-2713. - Resolution: Fixed Fix Version/s: (was: 4.4) 5.0 I beat on this test case a few times choosing all the codecs, and I could not repeat the slowdown, I am thinking that both the ThreadLeaks and performance issues have long been fixed. I am removing the fixed seed and closing this bug down, hopefully to never see it again. TestPhraseQuery.testRandomPhrases takes minutes to run with SimpleText -- Key: LUCENE-2713 URL: https://issues.apache.org/jira/browse/LUCENE-2713 Project: Lucene - Core Issue Type: Bug Components: general/test Affects Versions: 4.0-ALPHA Reporter: Robert Muir Labels: dead Fix For: 5.0 This test takes a few minutes to run if it gets simpletext codec. On hudson, it took 15 minutes! I added an assumeFalse(simpleText) as a temporary workaround, but we should see if there is somethign we can improve so we can remove this hack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-1890) auto-warming from Apache Solr causes NULL Pointer
[ https://issues.apache.org/jira/browse/LUCENE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer closed LUCENE-1890. --- Resolution: Cannot Reproduce Assignee: Greg Bowyer I am going to be bold and make the assumption that, since spatial has been re-worked and Lucene has gone from 2.x - 4.x this issue is no longer present. auto-warming from Apache Solr causes NULL Pointer - Key: LUCENE-1890 URL: https://issues.apache.org/jira/browse/LUCENE-1890 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Affects Versions: 2.4.1 Environment: Linux Reporter: Bill Bell Assignee: Greg Bowyer Labels: dead Fix For: 4.4 Attachments: localsolr.jar, lucene-spatial-2.9-dev.jar Sep 6, 2009 12:48:07 PM org.apache.solr.common.SolrException log SEVERE: Error during auto-warming of key:org.apache.solr.search.QueryResultKey@b00371eb:java.lang.NullPointerException at org.apache.lucene.spatial.tier.DistanceFieldComparatorSource$DistanceScoreDocLookupComparator.copy(DistanceFieldComparatorSource.java:101) at org.apache.lucene.search.TopFieldCollector$MultiComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:554) at org.apache.solr.search.DocSetDelegateCollector.collect(DocSetHitCollector.java:98) at org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:281) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:253) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1088) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:876) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:53) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:328) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1468) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1142) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2540) Document. add get(i) and addAll to make interacting with fieldables and documents easier/faster and more readable
[ https://issues.apache.org/jira/browse/LUCENE-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655743#comment-13655743 ] Greg Bowyer commented on LUCENE-2540: - Outside of batch adding fields it looks like this issue is somewhat dead since we can now address the field(s) by name, and have sensible iterators on them? Anyone opposed to closing this ? Document. add get(i) and addAll to make interacting with fieldables and documents easier/faster and more readable - Key: LUCENE-2540 URL: https://issues.apache.org/jira/browse/LUCENE-2540 Project: Lucene - Core Issue Type: Improvement Components: core/other Affects Versions: 3.0.2 Reporter: Woody Anderson Labels: dead Fix For: 4.4 Attachments: LUCENE-2540.patch Working with Document Fieldables is often a pain. getting the ith involves chained method calls and is not very readable: {code} // nice doc.getFieldable(i); // not nice doc.getFields().get(i); {code} also, when combining documents, or otherwise aggregating multiple fields into a single document, {code} // nice doc.addAll(fieldables); // note nice: less readable and more error prone ListFieldable fields = ...; for (Fieldable field : fields) { result.add(field); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3917) Port pruning module to trunk apis
[ https://issues.apache.org/jira/browse/LUCENE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3917: Attachment: LUCENE-3917-Initial-port-of-index-pruning.patch Recently at $DAYJOB the horror that is high frequency terms in OR search came to bite us, as a result I have an interest in pruning again. As such I made an attempt to forward port the existing pruning package directly to Lucene 4.0. This is largely a mechanical port, I have not put any real thought into it so its probably terrible. This does not pass its unit test, and is a mess internally in the code, I am going to try to get the unit test working and then loop back on making the code more lucene 4.x friendly. One question that occurs from this is how AtomicReaders are handled, do we want to pruning per segment with global stats, prune based on segment stats or just do the terrible thing and work with a SlowCompositeReader. I also think, given the work that went on with LUCENE-4752 it might be possible to do the pruning in a similar fashion to the sorting merge such that we do a pruning merge. Port pruning module to trunk apis - Key: LUCENE-3917 URL: https://issues.apache.org/jira/browse/LUCENE-3917 Project: Lucene - Core Issue Type: Task Components: modules/other Affects Versions: 4.0-ALPHA Reporter: Robert Muir Fix For: 4.3 Attachments: LUCENE-3917-Initial-port-of-index-pruning.patch Pruning module was added in LUCENE-1812, but we need to port this to trunk (4.0) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Description: Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yield better cache usage I think as we can reuse docsets across multiple queries, as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters was: Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD
[jira] [Updated] (SOLR-4616) HitRatio in mbean is of type String instead should be float/double.
[ https://issues.apache.org/jira/browse/SOLR-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-4616: -- Attachment: SOLR-4616-Make-HitRatio-in-cache-mbeans-a-float.patch This should fix it, I am going to test it out shortly (unit tests and such pass, just need to fire up a solr instance) HitRatio in mbean is of type String instead should be float/double. --- Key: SOLR-4616 URL: https://issues.apache.org/jira/browse/SOLR-4616 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.2 Environment: Solr 4.2 on JBoss7.1.1 Reporter: Aditya Assignee: Greg Bowyer Priority: Minor Attachments: SOLR-4616-Make-HitRatio-in-cache-mbeans-a-float.patch While using our existing System Monitoring tool with solr using JMX we noticed that the stats values for Cache is not consistence w.r.t data type. decimal values are returned as string instead should be of type float/double. e.g hitratio -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4156) JMX numDocs and maxDoc are of type string
[ https://issues.apache.org/jira/browse/SOLR-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer reassigned SOLR-4156: - Assignee: Greg Bowyer JMX numDocs and maxDoc are of type string - Key: SOLR-4156 URL: https://issues.apache.org/jira/browse/SOLR-4156 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Greg Harris Assignee: Greg Bowyer Priority: Minor Fix For: 4.3 Some monitoring tools are sensitive to the object types that the JMX link provides if not all. numDocs and maxDoc of the searcher MBean: solr/collection1:type=searcher,id=org.apache.solr.search.SolrIndexSearcher are provided as String's not ints. Int would allow monitoring tools to monitor them correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4156) JMX numDocs and maxDoc are of type string
[ https://issues.apache.org/jira/browse/SOLR-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647671#comment-13647671 ] Greg Bowyer commented on SOLR-4156: --- I don't think this is the case, I have just looked at this and it seems that the types are integer. numdocs claims this type: javax.management.openmbean.SimpleType(name=java.lang.Integer) maxdocs claims this type: javax.management.openmbean.SimpleType(name=java.lang.Integer) JMX numDocs and maxDoc are of type string - Key: SOLR-4156 URL: https://issues.apache.org/jira/browse/SOLR-4156 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Greg Harris Assignee: Greg Bowyer Priority: Minor Fix For: 4.3 Some monitoring tools are sensitive to the object types that the JMX link provides if not all. numDocs and maxDoc of the searcher MBean: solr/collection1:type=searcher,id=org.apache.solr.search.SolrIndexSearcher are provided as String's not ints. Int would allow monitoring tools to monitor them correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4616) HitRatio in mbean is of type String instead should be float/double.
[ https://issues.apache.org/jira/browse/SOLR-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer resolved SOLR-4616. --- Resolution: Fixed HitRatio in mbean is of type String instead should be float/double. --- Key: SOLR-4616 URL: https://issues.apache.org/jira/browse/SOLR-4616 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.2 Environment: Solr 4.2 on JBoss7.1.1 Reporter: Aditya Assignee: Greg Bowyer Priority: Minor Attachments: SOLR-4616-Make-HitRatio-in-cache-mbeans-a-float.patch While using our existing System Monitoring tool with solr using JMX we noticed that the stats values for Cache is not consistence w.r.t data type. decimal values are returned as string instead should be of type float/double. e.g hitratio -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch Updated to latest trunk, the cache unit test still fails as does the spatial lat/lon tests Make solr use lucene filters directly - Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, SOLR-3763-Make-solr-use-lucene-filters-directly.patch Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yeild better cache usage I think as we can resuse docsets across multiple queries as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600324#comment-13600324 ] Greg Bowyer commented on SOLR-4465: --- Does the CollectorSpec serve the same purpose as say the GroupingSpecification, that is to provide underlying collectors (and the search in general) with the right requirements information. I ask because maybe it would be easier to make the CollectorSpec support a map of String - Object or String - CollectorProperty I am trying to think how we can do grouping with this. but I might have misinterpreted what its for Configurable Collectors --- Key: SOLR-4465 URL: https://issues.apache.org/jira/browse/SOLR-4465 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.1 Reporter: Joel Bernstein Fix For: 4.3 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch This issue is to add configurable custom collectors to Solr. This expands the design and work done in issue SOLR-1680 to include: 1) CollectorFactory configuration in solconfig.xml 2) Http parameters to allow clients to dynamically select a CollectorFactory and construct a custom Collector. 3) Make aspects of QueryComponent pluggable so that the output from distributed search can conform with custom collectors at the shard level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4465) Configurable Collectors
[ https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587815#comment-13587815 ] Greg Bowyer commented on SOLR-4465: --- The tests are giving me null pointer exceptions inside SolrIndexSearcher for this, as for the stats part I was looking at this patch and giving some thought about providing that on the QueryCommand object, but I feel that it is not the correct place for this information. Configurable Collectors --- Key: SOLR-4465 URL: https://issues.apache.org/jira/browse/SOLR-4465 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.1 Reporter: Joel Bernstein Fix For: 4.2, 5.0 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch This issue is to add configurable custom collectors to Solr. This expands the design and work done in issue SOLR-1680 to include: 1) CollectorFactory configuration in solconfig.xml 2) Http parameters to allow clients to dynamically select a CollectorFactory and construct a custom Collector. 3) Make aspects of QueryComponent pluggable so that the output from distributed search can conform with custom collectors at the shard level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4464) DIH - Processed documents counter resets to zero after first database request
[ https://issues.apache.org/jira/browse/SOLR-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13579422#comment-13579422 ] Greg Bowyer commented on SOLR-4464: --- There is a good chance that a 250GB heap is the root cause of your problems, can you lower it to 16 or 32gb as a start and then see if this problem persists ? DIH - Processed documents counter resets to zero after first database request - Key: SOLR-4464 URL: https://issues.apache.org/jira/browse/SOLR-4464 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.1 Environment: CentOS 6.3 x64 / apache-tomcat-7.0.35 / mysql-connector-java-5.1.23 - Large machine 5TB of drives and 280GB RAM - Java Heap set to 250Gb - resources are not an issue. Reporter: Dave Cook Priority: Minor Labels: patch [11:20] quasimotoca Solr 4.1 - Processed documents resets to 0 after processing my first entity - all database schemas are identical [11:21] quasimotoca However, all the documents get fetched and I can query the results no problem. Here's a link to a screenshot - http://findocs/gridworkz.com/solr Everything works perfect except the screen doesn't increment the Processed counter on subsequent database Requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4464) DIH - Processed documents counter resets to zero after first database request
[ https://issues.apache.org/jira/browse/SOLR-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13579426#comment-13579426 ] Greg Bowyer commented on SOLR-4464: --- . I should read bug reports more carefully, everything else is working fine so maybe the heap size is not the issue (I would still lower it however) DIH - Processed documents counter resets to zero after first database request - Key: SOLR-4464 URL: https://issues.apache.org/jira/browse/SOLR-4464 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.1 Environment: CentOS 6.3 x64 / apache-tomcat-7.0.35 / mysql-connector-java-5.1.23 - Large machine 5TB of drives and 280GB RAM - Java Heap set to 250Gb - resources are not an issue. Reporter: Dave Cook Priority: Minor Labels: patch [11:20] quasimotoca Solr 4.1 - Processed documents resets to 0 after processing my first entity - all database schemas are identical [11:21] quasimotoca However, all the documents get fetched and I can query the results no problem. Here's a link to a screenshot - http://findocs/gridworkz.com/solr Everything works perfect except the screen doesn't increment the Processed counter on subsequent database Requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564880#comment-13564880 ] Greg Bowyer commented on LUCENE-3178: - So I was going to shut this down today, and just to make sure I ran the benchmark on the simplest code possible ... and suddenly I got good results, this is idiopathic :S https://gist.github.com/0f017853861d050c0b66 {code} Report after iter 19: TaskQPS baseline StdDevQPS mmap_tests StdDev Pct diff OrHighHigh1.68 (11.2%)1.73 (10.3%) 3.0% ( -16% - 27%) PKLookup 129.89 (5.8%) 135.03 (6.0%) 4.0% ( -7% - 16%) HighTerm8.09 (13.6%)8.43 (12.8%) 4.2% ( -19% - 35%) OrHighMed4.46 (10.4%)4.67 (9.5%) 4.7% ( -13% - 27%) OrHighLow4.82 (10.6%)5.09 (10.3%) 5.6% ( -13% - 29%) HighSpanNear0.92 (8.1%)0.97 (7.3%) 5.9% ( -8% - 23%) IntNRQ2.51 (10.2%)2.67 (9.9%) 6.6% ( -12% - 29%) HighPhrase0.30 (11.7%)0.32 (12.8%) 6.7% ( -16% - 35%) MedPhrase2.93 (6.8%)3.12 (8.2%) 6.7% ( -7% - 23%) AndHighHigh5.46 (6.6%)5.86 (7.0%) 7.2% ( -5% - 22%) Respell 19.68 (5.9%) 21.15 (6.6%) 7.5% ( -4% - 21%) LowPhrase0.46 (9.5%)0.50 (10.2%) 7.6% ( -11% - 30%) Prefix35.25 (8.2%)5.66 (7.7%) 7.9% ( -7% - 25%) HighSloppyPhrase1.54 (8.0%)1.67 (13.1%) 8.5% ( -11% - 32%) MedSpanNear5.25 (7.0%)5.72 (8.2%) 9.0% ( -5% - 25%) Wildcard 12.44 (5.7%) 13.59 (6.5%) 9.2% ( -2% - 22%) MedSloppyPhrase2.27 (7.2%)2.49 (8.5%) 9.5% ( -5% - 27%) MedTerm 28.16 (10.3%) 30.89 (9.9%) 9.7% ( -9% - 33%) Fuzzy1 18.91 (6.0%) 20.82 (6.7%) 10.1% ( -2% - 24%) Fuzzy2 19.69 (6.6%) 21.68 (7.5%) 10.1% ( -3% - 25%) AndHighMed7.79 (7.5%)8.58 (6.1%) 10.1% ( -3% - 25%) LowSpanNear1.45 (5.7%)1.60 (9.3%) 10.5% ( -4% - 27%) LowSloppyPhrase 22.84 (7.7%) 25.45 (9.7%) 11.4% ( -5% - 31%) LowTerm 46.46 (6.8%) 52.90 (7.6%) 13.9% ( 0% - 30%) AndHighLow 35.92 (5.3%) 42.38 (7.1%) 18.0% ( 5% - 32%) {code} Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436 ] Greg Bowyer commented on LUCENE-3178: - {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote{ Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is both maddening for me and even more damning . Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436 ] Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM: -- {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote} Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is both maddening for me and even more damning . was (Author: gbow...@fastmail.co.uk): {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote{ Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains
[jira] [Comment Edited] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436 ] Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 11:08 PM: --- {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote} Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all assuming that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains is both maddening for me and even more damning . was (Author: gbow...@fastmail.co.uk): {quote} I think this is largely related to Robert's comment: Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with. {quote} Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*) {quote} Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ. {quote} Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster. {quote} Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl. {quote} This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the cleaner, and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach. I am still poking at this for a bit longer, but I am leaning towards calling this bust. The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there. (This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident) {quote} I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings). {quote} I quite agree, even if this gave huge performance wins I would still put it in the bucket of its in misc, its not default and your on your own if it breaks. The fact it yields AFAICT no performance gains
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548885#comment-13548885 ] Greg Bowyer commented on LUCENE-3178: - Frustrating, it echos what I have been seeing so at least my benchmarking is not playing me up, I guess I will have to do some digging. Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3178: Attachment: LUCENE-3178-Native-MMap-implementation.patch Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3178: Attachment: LUCENE-3178-Native-MMap-implementation.patch Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3178: Attachment: LUCENE-3178-Native-MMap-implementation.patch Rough cut of a native mmap (does not do any madvise, probably insanely buggy etc etc) Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3178: Attachment: (was: LUCENE-3178-Native-MMap-implementation.patch) Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3178: Attachment: LUCENE-3178-Native-MMap-implementation.patch Temp skip unit test until fixed Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-3178-Native-MMap-implementation.patch Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: madvise and gregs hallucinations
On 12/11/2012 11:59 AM, Yonik Seeley wrote: On Tue, Dec 11, 2012 at 2:32 PM, Greg Bowyer gbow...@fastmail.co.uk wrote: Yes the index can fit in ram on the boxes I am testing with - Its the main rationale for sharding to make sure that we can hold an index in ram at all times. MADV_WILLNEED might be rather bad if the index is bigger than ram (something to test maybe) Agree. And the largest part of the index is often stored fields, which have a random access pattern. MADV_RANDOM? Maybe I would have to go digging to see if its implemented. With this said, so far its a hypothesis supported by weak experimentation so I need to get it under the benchmarking suite to really be sure -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528681#comment-13528681 ] Greg Bowyer commented on LUCENE-3178: - Tangentially I have been futzing a little with this based on some observations I noticed around madvise http://people.apache.org/~gbowyer/madvise-perf/index.html Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
madvise and gregs hallucinations
Since its too long (and has too much HTML and pictures and such forth) for the mailing list I have a more detailed write up here http://people.apache.org/~gbowyer/madvise-perf/index.html However the short version. At $DAYJOB as part of moving to lucene 4.0 I have been looking at what we can change in our modes of thinking, on of these relates (for us) to having a separate legacy system that serves the stored data for the search engine (which we imaginatively call docserve). Whilst playing with this I noticed that I lost a lot of performance rather quickly, after a lot of digging it looks like getting a call in mmaping to madvise (specifically a call of the form madvise(addr, MADV_WILLNEED) might improve search performance. Anyone have any thoughts and or ideas here, I am going to try to get a whole heap more investigation done before I start messing with lucene's behavior on mmap (because whilst it improves /my/ performance a lot it might be I only noticed it since I am poking an open-wound YMMV), but there is something of interest here. Then again this might be my own personal insanity .. :S -- Greg - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528687#comment-13528687 ] Greg Bowyer commented on LUCENE-3178: - Robert, do you still have you old approach code kicking about anywhere Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Core Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2701) Expose IndexWriter.commit(MapString,String commitUserData) to solr
[ https://issues.apache.org/jira/browse/SOLR-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506688#comment-13506688 ] Greg Bowyer commented on SOLR-2701: --- bq. I haven't had a chance to check out the rest of the patch/issue, but for this specifically, what about a convention? Anything under the persistent key in the commit data is carried over indefinitely. Or if persistent is the norm, then we could reverse it and have a transient map that is not carried over. The persistent/transient map sounds like a good idea; I will take a look at how that can be implemented Expose IndexWriter.commit(MapString,String commitUserData) to solr - Key: SOLR-2701 URL: https://issues.apache.org/jira/browse/SOLR-2701 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0-ALPHA Reporter: Eks Dev Priority: Minor Labels: commit, update Attachments: SOLR-2701-Expose-userCommitData-throughout-solr.patch, SOLR-2701.patch Original Estimate: 8h Remaining Estimate: 8h At the moment, there is no feature that enables associating user information to the commit point. Lucene supports this possibility and it should be exposed to solr as well, probably via beforeCommit Listener (analogous to prepareCommit in Lucene). Most likely home for this Map to live is UpdateHandler. Example use case would be an atomic tracking of sequence numbers or timestamps for incremental updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2701) Expose IndexWriter.commit(MapString,String commitUserData) to solr
[ https://issues.apache.org/jira/browse/SOLR-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-2701: -- Attachment: SOLR-2701-Expose-userCommitData-throughout-solr.patch I gave this another attempt today, and went full bore on trying to find all the locations of where userCommitData would need to be exposed to clients of the SOLR API. There are a few questions in my mind about this: * The backwards compat for javabin is not obvious, do we want to change up the version on javabin * What should be the exacting behavior around soft and autocommits * Should previous index commits carry forward in solr for ease of use ? Expose IndexWriter.commit(MapString,String commitUserData) to solr - Key: SOLR-2701 URL: https://issues.apache.org/jira/browse/SOLR-2701 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0-ALPHA Reporter: Eks Dev Priority: Minor Labels: commit, update Attachments: SOLR-2701-Expose-userCommitData-throughout-solr.patch, SOLR-2701.patch Original Estimate: 8h Remaining Estimate: 8h At the moment, there is no feature that enables associating user information to the commit point. Lucene supports this possibility and it should be exposed to solr as well, probably via beforeCommit Listener (analogous to prepareCommit in Lucene). Most likely home for this Map to live is UpdateHandler. Example use case would be an atomic tracking of sequence numbers or timestamps for incremental updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr ResponseBuilder and totalHitCount
Hi all, I have been off the radar for too long I am working on a requirement at $DAYJOB where there is a desire to monitor the rate of low and zero result queries, as such I did the simplest thing I could think of and wrote a search component that looks through the response object in the later phases of a distributed request. As I was doing this it struck me how inconsistent it is to find the total number of hits for a given search query, even if the response is manipulated heavily after the fact shouldn't we make it easier for people writing things like search components / transformers etc to find out how many matches they had ? This is where I started to wonder if it makes sense to hide / elide the original usage of totalHitCount as part of grouping, and use this field for presenting some sensible number of matches for the query; I know that this might break backwards compat with people who look at this field, but then I figure it is very ambiguously named so many naive users are likely to use this field not realizing that it is all about grouping. I am probably going to craft a patch to this end, unless someone has any intuition that I am missing here -- Greg - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3784) Schema-Browser hangs because of similarity
[ https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451442#comment-13451442 ] Greg Bowyer commented on SOLR-3784: --- Committed to trunk Sendingsolr/webapp/web/js/scripts/schema-browser.js Transmitting file data . Committed revision 1382385. and branch_4x Sendingsolr/webapp/web/js/scripts/schema-browser.js Transmitting file data . Committed revision 1382384. Schema-Browser hangs because of similarity -- Key: SOLR-3784 URL: https://issues.apache.org/jira/browse/SOLR-3784 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Greg Bowyer Attachments: SOLR-3784.patch, SOLR-3784.patch Opening the Schema-Browser with the Default Configuration, switching the selection to type=int throws an error: {code}Uncaught TypeError: Cannot call method 'esc' of undefined // schema-browser.js:893{code} On the first Look, this was introduced by SOLR-3572 .. right? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3784) Schema-Browser hangs because of similarity
[ https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer resolved SOLR-3784. --- Resolution: Fixed Schema-Browser hangs because of similarity -- Key: SOLR-3784 URL: https://issues.apache.org/jira/browse/SOLR-3784 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Greg Bowyer Attachments: SOLR-3784.patch, SOLR-3784.patch Opening the Schema-Browser with the Default Configuration, switching the selection to type=int throws an error: {code}Uncaught TypeError: Cannot call method 'esc' of undefined // schema-browser.js:893{code} On the first Look, this was introduced by SOLR-3572 .. right? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448541#comment-13448541 ] Greg Bowyer commented on LUCENE-4332: - {quote} permission java.security.SecurityPermission *, read,write; This makes no sense, as SecurityPermission has no action, so read,write should be ignored. I was restricting SecurityPermission with something in mind (see the last 2 lines that allowed only the BouncyCastle installed by TIKA - now everything is allowed). What fails if I remove that line? I have no time to run the whole pitest suite {quote} You are right on that, I will change it {quote} The idea was to find places (especially in TIKA) that do things they should not do (like enabling security providers), which makes the configuration of J2EE container hosting Solr hard. So we should limit all this, to see when somebody adds a new feature to Solr that needs additional permissions. I am already working on restricting RuntimePermission more, so only things like reflection and property access is allowed. {quote} Ok the intention changed a fair bit, I was still under the impression that we were targeting making this keep tests in a sandbox rather than helping solr with hosting inside complex J2EE arrangements Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448556#comment-13448556 ] Greg Bowyer commented on LUCENE-4332: - Ok the security permission stuff is tightened up for just the internal jvm cache basically it is as follows {code} // Needed for some things in DNS caching in the JVM permission java.security.SecurityPermission getProperty.networkaddress.cache.ttl; permission java.security.SecurityPermission getProperty.networkaddress.cache.negative.ttl; {code} branch_4x Sendinglucene/tools/junit4/tests.policy Transmitting file data . Committed revision 1381046. trunk Sendinglucene/tools/junit4/tests.policy Transmitting file data . Committed revision 1381047. greg@gregslaptop ~/project Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3784) Schema-Browser hangs because of similarity
[ https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3784: -- Attachment: SOLR-3784.patch I modified your patch a little to be defensive around the classname as well, I see where my initial mistake was I got my languages confused and thought that in JS {} would evaluate to false (like in python) Hopefully this should solve it, do you want to commit this or shall I ? Schema-Browser hangs because of similarity -- Key: SOLR-3784 URL: https://issues.apache.org/jira/browse/SOLR-3784 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Greg Bowyer Attachments: SOLR-3784.patch, SOLR-3784.patch Opening the Schema-Browser with the Default Configuration, switching the selection to type=int throws an error: {code}Uncaught TypeError: Cannot call method 'esc' of undefined // schema-browser.js:893{code} On the first Look, this was introduced by SOLR-3572 .. right? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3784) Schema-Browser hangs because of similarity
[ https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448111#comment-13448111 ] Greg Bowyer commented on SOLR-3784: --- Eep that was bad of me, want me to fix this ? Schema-Browser hangs because of similarity -- Key: SOLR-3784 URL: https://issues.apache.org/jira/browse/SOLR-3784 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Greg Bowyer Attachments: SOLR-3784.patch Opening the Schema-Browser with the Default Configuration, switching the selection to type=int throws an error: {code}Uncaught TypeError: Cannot call method 'esc' of undefined // schema-browser.js:893{code} On the first Look, this was introduced by SOLR-3572 .. right? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3784) Schema-Browser hangs because of similarity
[ https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448116#comment-13448116 ] Greg Bowyer commented on SOLR-3784: --- Humm why does this only trigger for type int ? that makes less sense Schema-Browser hangs because of similarity -- Key: SOLR-3784 URL: https://issues.apache.org/jira/browse/SOLR-3784 Project: Solr Issue Type: Bug Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Greg Bowyer Attachments: SOLR-3784.patch Opening the Schema-Browser with the Default Configuration, switching the selection to type=int throws an error: {code}Uncaught TypeError: Cannot call method 'esc' of undefined // schema-browser.js:893{code} On the first Look, this was introduced by SOLR-3572 .. right? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448212#comment-13448212 ] Greg Bowyer commented on LUCENE-4332: - committed to trunk Sendingbuild.xml Sendinglucene/build.xml Sendinglucene/common-build.xml Sendinglucene/tools/build.xml Sendinglucene/tools/junit4/tests.policy Sendingsolr/build.xml Sendingsolr/example/build.xml Sendingsolr/example/example-DIH/build.xml Transmitting file data Committed revision 1380938. and branch_4x Sendingbuild.xml Sendinglucene/build.xml Sendinglucene/common-build.xml Sendinglucene/tools/build.xml Sendinglucene/tools/junit4/tests.policy Sendingsolr/build.xml Sendingsolr/example/build.xml Sendingsolr/example/example-DIH/build.xml Transmitting file data Committed revision 1380937. Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448214#comment-13448214 ] Greg Bowyer commented on LUCENE-4332: - I guess now we need to figure out how and when to get jenkins to run this .. any thoughts ? Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448235#comment-13448235 ] Greg Bowyer commented on LUCENE-4332: - Pitest forks surrogate runner jvms, when these things bootup they do some socket related nonsense for communication (SecurityPermission), and set on thread time monitoring (ManagementPermission) SecurityPermission hides out in lots of strange places, one of these is in the DNS cache internal to a JVM. We could restrict it, I didn't for ease of configuration since the security manager is aimed at preventing none malicious mistakes (like writing outside of the sandbox) rather than as a full force prevention of malicious code (which would require a bit more thinking through IMHO) Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444718#comment-13444718 ] Greg Bowyer commented on LUCENE-4332: - {quote} Greg, what's this responsible for: {code} + property name=pitest.threads value=2 / Is this test execution concurrency at JVM level? If so, it needs to be set to 1 because tests won't support concurrent suites (too many static stuff lying around). {code} {quote} Its a number of runner vms used by pitest (I think) so in essence it should spawn two vms to run test through (but let me check I might have that wrong) {quote} {code} +!-- Java7 has a new bytecode verifier that _requires_ stackmaps to be present in the + bytecode. Unfortunately a *lot* of bytecode manipulation tools (including pitest) + do not currently write out this information; for now we are forced to disable this + in the jvm args -- {code} Both proguard and asmlib produce these I think. Just a note, I know it's probably not possible to integrate in pi's toolchain without changes to its source code. {quote} Humm weird I thought I had problems with the verifier but now that I just tried it, it worked - Guess I will be taking that out then {quote} Will this require people to put rhino in ant's lib/ folder? If so, I'd just use an ugly property value=-D..=${} -D..=${}... Yours is very nice but an additional step is required which seems an overkill? {quote} Might do, depends on the JVM vendor / version - I will change it to the ugly Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-4332: Attachment: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch I found and fixed a few things relating to security policies (ha!) Also I thought about the javascript a bit, I know thats its standard in the JVM but when I tested on a 1.6 I found that my javascript was a bit too advanced for it. That made me pause for thought and realise that I might have been a bit too clever for my own good there; I am not all that happy with the the JS so I went with the ugly approach for now. It think the right solution is to work with the pitest people and get it to support junit style nested tags like sysproperty Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445226#comment-13445226 ] Greg Bowyer commented on LUCENE-4332: - Do we want to implement this now that Uwe's changes are in 4.0 / trunk ? Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing
[ https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1344#comment-1344 ] Greg Bowyer commented on LUCENE-4337: - I have a slightly different approach that I was testing last night that programatically creates the policy so that we dont have to allow other permissions all the time Create Java security manager for forcible asserting behaviours in testing - Key: LUCENE-4337 URL: https://issues.apache.org/jira/browse/LUCENE-4337 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1, 5.0, 4.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: ChrootSecurityManager.java, ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, LUCENE-4337.patch Following on from conversations about mutation testing, there is an interest in building a Java security manager that is able to assert / guarantee certain behaviours -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing
[ https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-4337: Attachment: ChrootSecurityManagerTest.java ChrootSecurityManager.java Gregs alternative approach (untested - just for discussion) Create Java security manager for forcible asserting behaviours in testing - Key: LUCENE-4337 URL: https://issues.apache.org/jira/browse/LUCENE-4337 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1, 5.0, 4.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: ChrootSecurityManager.java, ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, LUCENE-4337.patch Following on from conversations about mutation testing, there is an interest in building a Java security manager that is able to assert / guarantee certain behaviours -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing
[ https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1345#comment-1345 ] Greg Bowyer edited comment on LUCENE-4337 at 8/30/12 8:47 AM: -- Gregs alternative approach (ChrootSecurityManager) (untested - just for discussion); not that I dont think we can use the approach Uwe worked on, just that I am not sure with going down the strict policy route was (Author: gbow...@fastmail.co.uk): Gregs alternative approach (untested - just for discussion) Create Java security manager for forcible asserting behaviours in testing - Key: LUCENE-4337 URL: https://issues.apache.org/jira/browse/LUCENE-4337 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1, 5.0, 4.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: ChrootSecurityManager.java, ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, LUCENE-4337.patch Following on from conversations about mutation testing, there is an interest in building a Java security manager that is able to assert / guarantee certain behaviours -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing
[ https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1356#comment-1356 ] Greg Bowyer commented on LUCENE-4337: - Actually you know what, I think I just talked myself out of my approach, yes its a bit more generic but maybe its less transparent to people that dont spend their time reading JVM source code, at least the policy file is more obvious to those who are approaching this for the first time. Create Java security manager for forcible asserting behaviours in testing - Key: LUCENE-4337 URL: https://issues.apache.org/jira/browse/LUCENE-4337 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1, 5.0, 4.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: ChrootSecurityManager.java, ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, LUCENE-4337.patch Following on from conversations about mutation testing, there is an interest in building a Java security manager that is able to assert / guarantee certain behaviours -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing
[ https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444635#comment-13444635 ] Greg Bowyer commented on LUCENE-4337: - {quote} I also spend my time in reading JDK source code today! But it was more when fixing Solr test violations (die, JMX/RMI, die, die, die). {quote} Right there with you on RMI Create Java security manager for forcible asserting behaviours in testing - Key: LUCENE-4337 URL: https://issues.apache.org/jira/browse/LUCENE-4337 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1, 5.0, 4.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: ChrootSecurityManager.java, ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, LUCENE-4337.patch Following on from conversations about mutation testing, there is an interest in building a Java security manager that is able to assert / guarantee certain behaviours -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-4332: Attachment: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch New version * uses ivy:cachepath as suggested * removed random license files * leverages security manager (LUCENE-4337) * uses fixed random seed Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-4332: Attachment: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Test coverage or testing the tests
Ok first cut at a version of this in the build https://issues.apache.org/jira/browse/LUCENE-4332 On 27/08/12 18:05, Greg Bowyer wrote: On 27/08/12 17:30, Chris Hostetter wrote: : This is cool. I'd say lets get it up and going on jenkins (even weekly : or something). why worry about the imperfections in any of these : coverage tools, whats way more important is when the results find : situations where you thought you were testing something, but really +1. Even if it hammers the machine so bad it can't be run on mortal hardware, it's still worth it to hook it into the build system so people with god like hardware can easily run it and file bugs based on what they see. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org The machine I ran it on cost me $5 from ec2 :D - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-4332: Attachment: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch Corrected jcommander license Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Description: Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yeild better cache usage I think as we can resuse docsets across multiple queries as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters was: Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD
[jira] [Created] (SOLR-3763) Make solr use lucene filters directly
Greg Bowyer created SOLR-3763: - Summary: Make solr use lucene filters directly Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yeild better cache usage I think as we can resuse docsets across multiple queries as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated SOLR-3763: -- Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch Initial version, this has some hacks in it and does not pass testing for caches since that needs to be reworked Make solr use lucene filters directly - Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yeild better cache usage I think as we can resuse docsets across multiple queries as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443304#comment-13443304 ] Greg Bowyer commented on LUCENE-4332: - Wow lot of interest . I will try to answer some of the salient points Core was missing until today as one test (TestLuceneConstantVersion) didn't run correctly as it was lacking the Lucene version system property. Currently pit refuses to run unless the underlying suite is all green (a good thing IMHO) so I didn't have core from my first run (its there now). This takes a long time to run, all of the ancillary Lucene packages take roughly 4 hours to run on the largest CPU ec2 instance, core takes 8 hours (this was the other reason core was missing, I was waiting for it to finish crunching) As to the random seed, I completely agree and it was one of the things I mentioned on the mailing list that makes the output of this tool not perfect. I do feel that the tests that are randomised typically do a better job at gaining coverage, but its a good idea to stabilise the seed. Jars and build.xml, I have no problems changing this to whatever people think fits best into the build. My impression was that clover is handled the way it is because it is not technically opensource and as a result has screwball licensing concerns, essentially I didn't know any better :S I will try to get a chance to make it use the ivy:cachepath approach. Regarding the risks posed by mutations, I cannot prove or say there are no risks; however mutation testing is not random in the mutations applied, they are formulaic and quite simple. It will not permute arguments nor will it mutate complex objects (it can and does mess with object references turning references in arguments to nulls). I can conceive of ways in which it could screwup mutated code making it possible to delete random files but I don't think they are going to be extremely likely situations. FWIW I would be less worried about this deleting something on the filesystem and far more worried about it accidentally leaving corpses of undeleted files. Sandboxing it could solve that issue, if that is too much effort another approach might be to work with the pitest team and build a security manager that is militant about file access, disallowing anything that canonicalises outside of a given path. Oh and as Robert suggested we can always point it away from key things. Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443304#comment-13443304 ] Greg Bowyer edited comment on LUCENE-4332 at 8/29/12 4:51 AM: -- Wow lot of interest . I will try to answer some of the salient points Core was missing until today as one test (TestLuceneConstantVersion) didn't run correctly as it was lacking the Lucene version system property. Currently pit refuses to run unless the underlying suite is all green (a good thing IMHO) so I didn't have core from my first run (its there now). This takes a long time to run, all of the ancillary Lucene packages take roughly 4 hours to run on the largest CPU ec2 instance, core takes 8 hours (this was the other reason core was missing, I was waiting for it to finish crunching) As to the random seed, I completely agree and it was one of the things I mentioned on the mailing list that makes the output of this tool not perfect. I do feel that the tests that are randomised typically do a better job at gaining coverage, but its a good idea to stabilise the seed. Jars and build.xml, I have no problems changing this to whatever people think fits best into the build. My impression was that clover is handled the way it is because it is not technically opensource and as a result has screwball licensing concerns, essentially I didn't know any better :S I will try to get a chance to make it use the ivy:cachepath approach. Regarding the risks posed by mutations, I cannot prove or say there are no risks; however mutation testing is not random in the mutations applied, they are formulaic and quite simple. It will not permute arguments nor will it mutate complex objects (it can and does mess with object references turning references in arguments to nulls). I can conceive of ways in which it could screwup mutated code making it possible to delete random files but I don't think they are going to be extremely likely situations. FWIW I would be less worried about this deleting something on the filesystem and far more worried about it accidentally leaving corpses of undeleted files. Sandboxing it could solve that issue, if that is too much effort another approach might be to work with the pitest team and build a security manager that is militant about file access, disallowing anything that canonicalises outside of a given path. Oh and as Robert suggested we can always point it away from key things. At the end of the day its a tool like any other, I have exactly the same feelings as Robert on this {quote} This is cool. I'd say lets get it up and going on jenkins (even weekly or something). why worry about the imperfections in any of these coverage tools, whats way more important is when the results find situations where you thought you were testing something, but really arent, etc (here was a recent one found by clover http://svn.apache.org/viewvc?rev=1376722view=rev). so imo just another tool to be able to identify serious gaps/test-bugs after things are up and running. and especially looking at deltas from line coverage to identify stuff thats 'executing' but not actually being tested. {quote} was (Author: gbow...@fastmail.co.uk): Wow lot of interest . I will try to answer some of the salient points Core was missing until today as one test (TestLuceneConstantVersion) didn't run correctly as it was lacking the Lucene version system property. Currently pit refuses to run unless the underlying suite is all green (a good thing IMHO) so I didn't have core from my first run (its there now). This takes a long time to run, all of the ancillary Lucene packages take roughly 4 hours to run on the largest CPU ec2 instance, core takes 8 hours (this was the other reason core was missing, I was waiting for it to finish crunching) As to the random seed, I completely agree and it was one of the things I mentioned on the mailing list that makes the output of this tool not perfect. I do feel that the tests that are randomised typically do a better job at gaining coverage, but its a good idea to stabilise the seed. Jars and build.xml, I have no problems changing this to whatever people think fits best into the build. My impression was that clover is handled the way it is because it is not technically opensource and as a result has screwball licensing concerns, essentially I didn't know any better :S I will try to get a chance to make it use the ivy:cachepath approach. Regarding the risks posed by mutations, I cannot prove or say there are no risks; however mutation testing is not random in the mutations applied, they are formulaic and quite simple. It will not permute arguments nor will it mutate complex objects (it can and does mess with object references turning references in arguments to nulls). I can conceive of ways in which it could
[jira] [Commented] (SOLR-3763) Make solr use lucene filters directly
[ https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443409#comment-13443409 ] Greg Bowyer commented on SOLR-3763: --- I guess my next step is to get caching working, I am not sure quite how to take baby steps with this beyond getting to feature parity. Make solr use lucene filters directly - Key: SOLR-3763 URL: https://issues.apache.org/jira/browse/SOLR-3763 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch Presently solr uses bitsets, queries and collectors to implement the concept of filters. This has proven to be very powerful, but does come at the cost of introducing a large body of code into solr making it harder to optimise and maintain. Another issue here is that filters currently cache sub-optimally given the changes in lucene towards atomic readers. Rather than patch these issues, this is an attempt to rework the filters in solr to leverage the Filter subsystem from lucene as much as possible. In good time the aim is to get this to do the following: ∘ Handle setting up filter implementations that are able to correctly cache with reference to the AtomicReader that they are caching for rather that for the entire index at large ∘ Get the post filters working, I am thinking that this can be done via lucenes chained filter, with the ‟expensive” filters being put towards the end of the chain - this has different semantics internally to the original implementation but IMHO should have the same result for end users ∘ Learn how to create filters that are potentially more efficient, at present solr basically runs a simple query that gathers a DocSet that relates to the documents that we want filtered; it would be interesting to make use of filter implementations that are in theory faster than query filters (for instance there are filters that are able to query the FieldCache) ∘ Learn how to decompose filters so that a complex filter query can be cached (potentially) as its constituent parts; for example the filter below currently needs love, care and feeding to ensure that the filter cache is not unduly stressed {code} 'category:(100) OR category:(200) OR category:(300)' {code} Really there is no reason not to express this in a cached form as {code} BooleanFilter( FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD), FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD) ) {code} This would yeild better cache usage I think as we can resuse docsets across multiple queries as well as avoid issues when filters are presented in differing orders ∘ Instead of end users providing costing we might (and this is a big might FWIW), be able to create a sort of execution plan of filters, leveraging a combination of what the index is able to tell us as well as sampling and ‟educated guesswork”; in essence this is what some DBMS software, for example postgresql does - it has a genetic algo that attempts to solve the travelling salesman - to great effect ∘ I am sure I will probably come up with other ambitious ideas to plug in here . :S Patches obviously forthcoming but the bulk of the work can be followed here https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443424#comment-13443424 ] Greg Bowyer commented on LUCENE-4332: - {quote} Thats a cool idea also for our own tests! We should install a SecurityManager always and only allow files in build/test. LuceneTestCase can enforce this SecurityManager installed! And if a test writes outside, fail it! {quote} Should we split out that as a separate thing and get a security manager built that hooks into the awesome carrot testing stuffs Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443449#comment-13443449 ] Greg Bowyer commented on LUCENE-4332: - Following up it turns out to be *very* simple to do the security manager trick {code:java} import java.io.File; public class Test { public static void main(String... args) { System.setSecurityManager(new SecurityManager() { public void checkDelete(String file) throws SecurityException { File fp = new File(file); String path = fp.getAbsolutePath(); if (!path.startsWith(/tmp)) { throw new SecurityException(Bang!); } } }); new File(/home/greg/test).delete(); } } {code} {code} Exception in thread main java.lang.SecurityException: Bang! at Test$1.checkDelete(Test.java:12) at java.io.File.delete(File.java:971) at Test.main(Test.java:17) {code} Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443449#comment-13443449 ] Greg Bowyer edited comment on LUCENE-4332 at 8/29/12 6:48 AM: -- Following up it turns out to be *very* simple to do the security manager trick {code:java} import java.io.File; public class Test { public static void main(String... args) { System.setSecurityManager(new SecurityManager() { public void checkDelete(String file) throws SecurityException { File fp = new File(file); String path = fp.getAbsolutePath(); if (!path.startsWith(/tmp)) { throw new SecurityException(Bang!); } } }); new File(/home/greg/test).delete(); } } {code} {code} Exception in thread main java.lang.SecurityException: Bang! at Test$1.checkDelete(Test.java:12) at java.io.File.delete(File.java:971) at Test.main(Test.java:17) {code} There is a lot of scope here if you want to abuse checking for all sorts of things (files, sockets, threads etc) was (Author: gbow...@fastmail.co.uk): Following up it turns out to be *very* simple to do the security manager trick {code:java} import java.io.File; public class Test { public static void main(String... args) { System.setSecurityManager(new SecurityManager() { public void checkDelete(String file) throws SecurityException { File fp = new File(file); String path = fp.getAbsolutePath(); if (!path.startsWith(/tmp)) { throw new SecurityException(Bang!); } } }); new File(/home/greg/test).delete(); } } {code} {code} Exception in thread main java.lang.SecurityException: Bang! at Test$1.checkDelete(Test.java:12) at java.io.File.delete(File.java:971) at Test.main(Test.java:17) {code} Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing
Greg Bowyer created LUCENE-4337: --- Summary: Create Java security manager for forcible asserting behaviours in testing Key: LUCENE-4337 URL: https://issues.apache.org/jira/browse/LUCENE-4337 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1, 5.0, 4.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Following on from conversations about mutation testing, there is an interest in building a Java security manager that is able to assert / guarantee certain behaviours -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
[ https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443513#comment-13443513 ] Greg Bowyer commented on LUCENE-4332: - I can codify a security manager, they are somewhat complex but I see our needs here as very simple (essentially assert file paths) Integrate PiTest mutation coverage tool into build -- Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Labels: build Attachments: LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Test coverage or testing the tests
Hi all At my current $DAYJOB we have been having a bit of success with an alternative coverage tool called pit-test ( http://pitest.org/). Essentially pit-test is a mutation testing tool that attempts to see how well the unit tests are able to catch alterations and regressions in the code that they aim to test; this is done by determining what code the test actually touch and then mutating that code in some fashion. I have as such started working on seeing if I can integrate pit-test into the lucene build, the tool itself is apache licensed which solves that particular issue. The main downsides I can see is that the coverage might be different across runs due to lucenes random testing, as well as the time it takes to run coverage (~4 hours on a big bad sandy bridge machine). I have published initial results for lucene packages, core is missing because there is one test in core that does not work with pit-test currently (and as such I am currently working on getting core generated. It can be found here http://people.apache.org/~gbowyer/pitest/ http://people.apache.org/%7Egbowyer/pitest/ Is this of interest to anyone other than myself, especially given the aggressive nature of how lucene is tested ? -- Greg
Re: Test coverage or testing the tests
On 27/08/12 17:30, Chris Hostetter wrote: : This is cool. I'd say lets get it up and going on jenkins (even weekly : or something). why worry about the imperfections in any of these : coverage tools, whats way more important is when the results find : situations where you thought you were testing something, but really +1. Even if it hammers the machine so bad it can't be run on mortal hardware, it's still worth it to hook it into the build system so people with god like hardware can easily run it and file bugs based on what they see. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org The machine I ran it on cost me $5 from ec2 :D - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4332) Integrate PiTest mutation coverage tool into build
Greg Bowyer created LUCENE-4332: --- Summary: Integrate PiTest mutation coverage tool into build Key: LUCENE-4332 URL: https://issues.apache.org/jira/browse/LUCENE-4332 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.1, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer As discussed briefly on the mailing list, this patch is an attempt to integrate the PiTest mutation coverage tool into the lucene build -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3572) Make Schema-Browser show custom similarities
[ https://issues.apache.org/jira/browse/SOLR-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer resolved SOLR-3572. --- Resolution: Fixed Fix Version/s: 4.0 Catching up with these things committed to trunk: 1373117 committed to branch_4x: 1373146 Make Schema-Browser show custom similarities Key: SOLR-3572 URL: https://issues.apache.org/jira/browse/SOLR-3572 Project: Solr Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Greg Bowyer Assignee: Greg Bowyer Fix For: 4.0 Attachments: SOLR-3572-similarity-schemabrowser.patch When a custom similarity is defined in the solr schema it is helpful to have the schema browser show the custom similarity -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3673) Random variate functions
[ https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422029#comment-13422029 ] Greg Bowyer commented on SOLR-3673: --- {quote} This is where my total ignorance of these random generators and how they use comes in: it looked to me like these generators in your patch just took in a java.util.Random as input – is there a particular reason why this Mrs. Twister random needs to be used? what does that give us that java.util.Random doesn't? {quote} They can take anything that extends java.util.Random, the only issue that exists with the inbuilt one is that its chance of repeating itself is outstandingly low, it has some properties with the numbers it generates that make it generate that are statistically poor and its slightly slower. I dont lay claim to being an expert on this stuff, I am going on what I have been told, the usage of MT is a side benefit of cheating on the distributions and using the ones that come out of the box in uncommons-math - since I had a better RNG available I used it {quote} FWIW: 128bits isn't that much if you let the seed argument to the function be an arbitrary String - even if you ignore the high bits the user just needs to give you 16 chars (less if we include stuff like the index version) {quote} Yeah its not a lot and manageable, I was more thinking about avoiding it being too configurable {quote} This is kind of where my use case question comes into play as well ... if the goal is just to use these generators to get a biased shuffling of the docs (ie: maybe you use certain random distribution and then frange filter on it get a set of documents with a roughly predictable size) then it's not that bad if the seeds aren't very complex – throw in the SolrCore start time to get a few more bits, etc But if there is some sort of cryptography goal then obviously having a good random seed that is unpredictable is a lot more important. {quote} The first use case, also use cases involving bending things towards distributions to act as cheap models. This stuff is useless as it stands for crypto anyhow since these RNG's are fairly predictable. Random variate functions Key: SOLR-3673 URL: https://issues.apache.org/jira/browse/SOLR-3673 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3673.patch Hi all At my $DAYJOB I have been asked to build a few random variate functions that return random numbers bound to a distribution. I think these can be added to solr. I have a hesitation in that the code as written uses / needs uncommons math (because we want a far better RNG than java's and because I am lazy and did not want to write distributions) uncommons math is apache license so we are good on that front anyone have any thoughts on this ? For reference the functions are: rgaussian(mean, stddev) - Random value aligned to gaussian distribution rpoisson(mean) - Random value aligned to poisson distribution rbinomial(n, prob) - Random value aligned to binomial distribtion rcontinous(min ,max) - random continuous value between min and max rdiscrete(min, max) - Random discrete value between min and max rexponential(rate) - Random value from the exponential distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3673) Random variate functions
[ https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422029#comment-13422029 ] Greg Bowyer edited comment on SOLR-3673 at 7/25/12 6:24 AM: {quote} This is where my total ignorance of these random generators and how they use comes in: it looked to me like these generators in your patch just took in a java.util.Random as input – is there a particular reason why this Mrs. Twister random needs to be used? what does that give us that java.util.Random doesn't? {quote} They can take anything that extends java.util.Random, the only issue that exists with the inbuilt one is that its chance of repeating itself is outstandingly low, it has some properties with the numbers it generates that make it generate that are statistically poor and its slightly slower. I dont lay claim to being an expert on this stuff, I am going on what I have been told, the usage of MT is a side benefit of cheating on the distributions and using the ones that come out of the box in uncommons-math - since I had a better RNG available I used it {quote} FWIW: 128bits isn't that much if you let the seed argument to the function be an arbitrary String - even if you ignore the high bits the user just needs to give you 16 chars (less if we include stuff like the index version) {quote} Yeah its not a lot and manageable, I was more thinking about avoiding it being too configurable (for example I think saying rguassian(1, 0.5, some very long seed with lots of data, XORShift) would be too far) I will implement the passing in of a seed value for sure (thats pretty sensible), I was more worried about making sure that the seed was just random (ha!) data the user passed in, and that there is not expectation over whats happening under the hood. {quote} This is kind of where my use case question comes into play as well ... if the goal is just to use these generators to get a biased shuffling of the docs (ie: maybe you use certain random distribution and then frange filter on it get a set of documents with a roughly predictable size) then it's not that bad if the seeds aren't very complex – throw in the SolrCore start time to get a few more bits, etc But if there is some sort of cryptography goal then obviously having a good random seed that is unpredictable is a lot more important. {quote} The first use case, also use cases involving bending things towards distributions to act as cheap models. This stuff is useless as it stands for crypto anyhow since these RNG's are fairly predictable. was (Author: gbow...@fastmail.co.uk): {quote} This is where my total ignorance of these random generators and how they use comes in: it looked to me like these generators in your patch just took in a java.util.Random as input – is there a particular reason why this Mrs. Twister random needs to be used? what does that give us that java.util.Random doesn't? {quote} They can take anything that extends java.util.Random, the only issue that exists with the inbuilt one is that its chance of repeating itself is outstandingly low, it has some properties with the numbers it generates that make it generate that are statistically poor and its slightly slower. I dont lay claim to being an expert on this stuff, I am going on what I have been told, the usage of MT is a side benefit of cheating on the distributions and using the ones that come out of the box in uncommons-math - since I had a better RNG available I used it {quote} FWIW: 128bits isn't that much if you let the seed argument to the function be an arbitrary String - even if you ignore the high bits the user just needs to give you 16 chars (less if we include stuff like the index version) {quote} Yeah its not a lot and manageable, I was more thinking about avoiding it being too configurable {quote} This is kind of where my use case question comes into play as well ... if the goal is just to use these generators to get a biased shuffling of the docs (ie: maybe you use certain random distribution and then frange filter on it get a set of documents with a roughly predictable size) then it's not that bad if the seeds aren't very complex – throw in the SolrCore start time to get a few more bits, etc But if there is some sort of cryptography goal then obviously having a good random seed that is unpredictable is a lot more important. {quote} The first use case, also use cases involving bending things towards distributions to act as cheap models. This stuff is useless as it stands for crypto anyhow since these RNG's are fairly predictable. Random variate functions Key: SOLR-3673 URL: https://issues.apache.org/jira/browse/SOLR-3673 Project: Solr Issue Type: Improvement
[jira] [Commented] (SOLR-3673) Random variate functions
[ https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422451#comment-13422451 ] Greg Bowyer commented on SOLR-3673: --- Good idea, although interestingly I have noticed (from some brief source diving) that mahout actual embeds uncommons math :S As for secure random, its aimed at crypto and so reads from high quality entropy like /dev/random, whilst this can be changed around it gets more complex than it needs to be. Random variate functions Key: SOLR-3673 URL: https://issues.apache.org/jira/browse/SOLR-3673 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3673.patch Hi all At my $DAYJOB I have been asked to build a few random variate functions that return random numbers bound to a distribution. I think these can be added to solr. I have a hesitation in that the code as written uses / needs uncommons math (because we want a far better RNG than java's and because I am lazy and did not want to write distributions) uncommons math is apache license so we are good on that front anyone have any thoughts on this ? For reference the functions are: rgaussian(mean, stddev) - Random value aligned to gaussian distribution rpoisson(mean) - Random value aligned to poisson distribution rbinomial(n, prob) - Random value aligned to binomial distribtion rcontinous(min ,max) - random continuous value between min and max rdiscrete(min, max) - Random discrete value between min and max rexponential(rate) - Random value from the exponential distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3673) Random variate functions
[ https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422451#comment-13422451 ] Greg Bowyer edited comment on SOLR-3673 at 7/25/12 6:05 PM: Good idea, although interestingly I have noticed (from some brief source diving) that mahout actual embeds uncommons math :S As for secure random, its aimed at crypto and so reads from high quality entropy like /dev/random (which blocks), whilst this can be changed around it gets more complex than it needs to be. was (Author: gbow...@fastmail.co.uk): Good idea, although interestingly I have noticed (from some brief source diving) that mahout actual embeds uncommons math :S As for secure random, its aimed at crypto and so reads from high quality entropy like /dev/random, whilst this can be changed around it gets more complex than it needs to be. Random variate functions Key: SOLR-3673 URL: https://issues.apache.org/jira/browse/SOLR-3673 Project: Solr Issue Type: Improvement Affects Versions: 4.0, 5.0 Reporter: Greg Bowyer Assignee: Greg Bowyer Attachments: SOLR-3673.patch Hi all At my $DAYJOB I have been asked to build a few random variate functions that return random numbers bound to a distribution. I think these can be added to solr. I have a hesitation in that the code as written uses / needs uncommons math (because we want a far better RNG than java's and because I am lazy and did not want to write distributions) uncommons math is apache license so we are good on that front anyone have any thoughts on this ? For reference the functions are: rgaussian(mean, stddev) - Random value aligned to gaussian distribution rpoisson(mean) - Random value aligned to poisson distribution rbinomial(n, prob) - Random value aligned to binomial distribtion rcontinous(min ,max) - random continuous value between min and max rdiscrete(min, max) - Random discrete value between min and max rexponential(rate) - Random value from the exponential distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org