from:"Greg Bowyer"

Greg Bowyer created LUCENE-6536:
---

 Summary: Migrate HDFSDirectory from solr to lucene-hadoop
 Key: LUCENE-6536
 URL: https://issues.apache.org/jira/browse/LUCENE-6536
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Greg Bowyer


I am currently working on a search engine that is throughput orientated and 
works entirely in apache-spark.

As part of this, I need a directory implementation that can operate on HDFS 
directly. This got me thinking, can I take the one that was worked on so hard 
for solr hadoop.

As such I migrated the HDFS and blockcache directories out to a lucene-hadoop 
module.

Having done this work, I am not sure if it is actually a good change, it feels 
a bit messy, and I dont like how the Metrics class gets extended and abused.

Thoughts anyone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop


 [ 
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-6536:

Attachment: LUCENE-6536.patch

 Migrate HDFSDirectory from solr to lucene-hadoop
 

 Key: LUCENE-6536
 URL: https://issues.apache.org/jira/browse/LUCENE-6536
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Greg Bowyer
  Labels: hadoop, hdfs, lucene, solr
 Attachments: LUCENE-6536.patch


 I am currently working on a search engine that is throughput orientated and 
 works entirely in apache-spark.
 As part of this, I need a directory implementation that can operate on HDFS 
 directly. This got me thinking, can I take the one that was worked on so hard 
 for solr hadoop.
 As such I migrated the HDFS and blockcache directories out to a lucene-hadoop 
 module.
 Having done this work, I am not sure if it is actually a good change, it 
 feels a bit messy, and I dont like how the Metrics class gets extended and 
 abused.
 Thoughts anyone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop

[
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579210#comment-14579210
]

Greg Bowyer commented on LUCENE-6536:
-

bq. Questions:
bq. What will be done to deal with the bugginess of this thing? I see many
reports of user corruption issues. By committing it, we take responsibility for
this and it becomes our problem. I don't want to see the code committed to
lucene just for this reason.

Fix its bugs ;), joking aside is it the directory or the blockcache that is the
source of most of the corruptions

bq. What will be done about the performance? I am not really sure the entire
technique is viable.
My usecase is a bit odd, I have many small (2*HDFS block) indexes that get run
over map jobs in hadoop. The performance I got last time I did this (with a
dirty hack Directory that copied the files in and out of HDFS :S) was pretty
good.

Its a throughput orientated usage, I think if you tried to use this to back an
online searcher you would have poor performance.

bq. Personally, I think if someone wants to do this, a better integration point
is to make it a java 7 filesystem provider. That is really how such a
filesystem should work anyway.

That is awesome I didnt know such an SPI existed in java. I have found a few
people that are trying to make a provider for hadoop.

I also dont have the greatest love for this path, the more test manipulations I
did the less and less it felt like a simple feature that should be in lucene. I
might try to either strip out the block-cache from this patch, or use a HDFS
filesystem SPI in java7.

Migrate HDFSDirectory from solr to lucene-hadoop

Key: LUCENE-6536
URL: https://issues.apache.org/jira/browse/LUCENE-6536
Project: Lucene - Core
Issue Type: Improvement
Reporter: Greg Bowyer
Labels: hadoop, hdfs, lucene, solr
Attachments: LUCENE-6536.patch

I am currently working on a search engine that is throughput orientated and
works entirely in apache-spark.
As part of this, I need a directory implementation that can operate on HDFS
directly. This got me thinking, can I take the one that was worked on so hard
for solr hadoop.
As such I migrated the HDFS and blockcache directories out to a lucene-hadoop
module.
Having done this work, I am not sure if it is actually a good change, it
feels a bit messy, and I dont like how the Metrics class gets extended and
abused.
Thoughts anyone

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop


[ 
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579388#comment-14579388
 ] 

Greg Bowyer commented on LUCENE-6536:
-

bq. That leads you to a test impl here: 
https://github.com/damiencarol/jsr203-hadoop

These are the people that I am talking about.

 Migrate HDFSDirectory from solr to lucene-hadoop
 

 Key: LUCENE-6536
 URL: https://issues.apache.org/jira/browse/LUCENE-6536
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Greg Bowyer
  Labels: hadoop, hdfs, lucene, solr
 Attachments: LUCENE-6536.patch


 I am currently working on a search engine that is throughput orientated and 
 works entirely in apache-spark.
 As part of this, I need a directory implementation that can operate on HDFS 
 directly. This got me thinking, can I take the one that was worked on so hard 
 for solr hadoop.
 As such I migrated the HDFS and blockcache directories out to a lucene-hadoop 
 module.
 Having done this work, I am not sure if it is actually a good change, it 
 feels a bit messy, and I dont like how the Metrics class gets extended and 
 abused.
 Thoughts anyone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6536) Migrate HDFSDirectory from solr to lucene-hadoop


[ 
https://issues.apache.org/jira/browse/LUCENE-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579400#comment-14579400
 ] 

Greg Bowyer commented on LUCENE-6536:
-

Oh wow the blur store might be exactly what I am looking for.

 Migrate HDFSDirectory from solr to lucene-hadoop
 

 Key: LUCENE-6536
 URL: https://issues.apache.org/jira/browse/LUCENE-6536
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Greg Bowyer
  Labels: hadoop, hdfs, lucene, solr
 Attachments: LUCENE-6536.patch


 I am currently working on a search engine that is throughput orientated and 
 works entirely in apache-spark.
 As part of this, I need a directory implementation that can operate on HDFS 
 directly. This got me thinking, can I take the one that was worked on so hard 
 for solr hadoop.
 As such I migrated the HDFS and blockcache directories out to a lucene-hadoop 
 module.
 Having done this work, I am not sure if it is actually a good change, it 
 feels a bit messy, and I dont like how the Metrics class gets extended and 
 abused.
 Thoughts anyone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2014-03-25 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947339#comment-13947339
 ] 

Greg Bowyer commented on LUCENE-3178:
-

Shall we call these experiments done, I think it concludes that where we do get 
wins they are minor and the baggage that comes with is a bit too much

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2014
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3917) Port pruning module to trunk apis

2014-03-25 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer reassigned LUCENE-3917:
---

Assignee: Greg Bowyer

 Port pruning module to trunk apis
 -

 Key: LUCENE-3917
 URL: https://issues.apache.org/jira/browse/LUCENE-3917
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/other
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
Assignee: Greg Bowyer
 Fix For: 4.8

 Attachments: LUCENE-3917-Initial-port-of-index-pruning.patch


 Pruning module was added in LUCENE-1812, but we need to port
 this to trunk (4.0)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2014-03-24 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945878#comment-13945878
 ] 

Greg Bowyer commented on LUCENE-3178:
-

Interesting, its also somewhat interesting to see that AndHighLow gets a big 
jump in performance, any ideas on why that might be

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2014
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2014-03-06 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564880#comment-13564880
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 3/7/14 6:48 AM:
-

So I was going to shut this down today, and just to make sure I ran the 
benchmark on the simplest code possible

... and suddenly I got good results, this is idiopathic :S

{code}
Report after iter 19:
TaskQPS baseline  StdDevQPS mmap_tests  StdDev  
  Pct diff
  OrHighHigh1.68 (11.2%)1.73 (10.3%)
3.0% ( -16% -   27%)
PKLookup  129.89  (5.8%)  135.03  (6.0%)
4.0% (  -7% -   16%)
HighTerm8.09 (13.6%)8.43 (12.8%)
4.2% ( -19% -   35%)
   OrHighMed4.46 (10.4%)4.67  (9.5%)
4.7% ( -13% -   27%)
   OrHighLow4.82 (10.6%)5.09 (10.3%)
5.6% ( -13% -   29%)
HighSpanNear0.92  (8.1%)0.97  (7.3%)
5.9% (  -8% -   23%)
  IntNRQ2.51 (10.2%)2.67  (9.9%)
6.6% ( -12% -   29%)
  HighPhrase0.30 (11.7%)0.32 (12.8%)
6.7% ( -16% -   35%)
   MedPhrase2.93  (6.8%)3.12  (8.2%)
6.7% (  -7% -   23%)
 AndHighHigh5.46  (6.6%)5.86  (7.0%)
7.2% (  -5% -   22%)
 Respell   19.68  (5.9%)   21.15  (6.6%)
7.5% (  -4% -   21%)
   LowPhrase0.46  (9.5%)0.50 (10.2%)
7.6% ( -11% -   30%)
 Prefix35.25  (8.2%)5.66  (7.7%)
7.9% (  -7% -   25%)
HighSloppyPhrase1.54  (8.0%)1.67 (13.1%)
8.5% ( -11% -   32%)
 MedSpanNear5.25  (7.0%)5.72  (8.2%)
9.0% (  -5% -   25%)
Wildcard   12.44  (5.7%)   13.59  (6.5%)
9.2% (  -2% -   22%)
 MedSloppyPhrase2.27  (7.2%)2.49  (8.5%)
9.5% (  -5% -   27%)
 MedTerm   28.16 (10.3%)   30.89  (9.9%)
9.7% (  -9% -   33%)
  Fuzzy1   18.91  (6.0%)   20.82  (6.7%)   
10.1% (  -2% -   24%)
  Fuzzy2   19.69  (6.6%)   21.68  (7.5%)   
10.1% (  -3% -   25%)
  AndHighMed7.79  (7.5%)8.58  (6.1%)   
10.1% (  -3% -   25%)
 LowSpanNear1.45  (5.7%)1.60  (9.3%)   
10.5% (  -4% -   27%)
 LowSloppyPhrase   22.84  (7.7%)   25.45  (9.7%)   
11.4% (  -5% -   31%)
 LowTerm   46.46  (6.8%)   52.90  (7.6%)   
13.9% (   0% -   30%)
  AndHighLow   35.92  (5.3%)   42.38  (7.1%)   
18.0% (   5% -   32%)
{code}


was (Author: gbow...@fastmail.co.uk):
So I was going to shut this down today, and just to make sure I ran the 
benchmark on the simplest code possible

... and suddenly I got good results, this is idiopathic :S

https://gist.github.com/0f017853861d050c0b66
{code}
Report after iter 19:
TaskQPS baseline  StdDevQPS mmap_tests  StdDev  
  Pct diff
  OrHighHigh1.68 (11.2%)1.73 (10.3%)
3.0% ( -16% -   27%)
PKLookup  129.89  (5.8%)  135.03  (6.0%)
4.0% (  -7% -   16%)
HighTerm8.09 (13.6%)8.43 (12.8%)
4.2% ( -19% -   35%)
   OrHighMed4.46 (10.4%)4.67  (9.5%)
4.7% ( -13% -   27%)
   OrHighLow4.82 (10.6%)5.09 (10.3%)
5.6% ( -13% -   29%)
HighSpanNear0.92  (8.1%)0.97  (7.3%)
5.9% (  -8% -   23%)
  IntNRQ2.51 (10.2%)2.67  (9.9%)
6.6% ( -12% -   29%)
  HighPhrase0.30 (11.7%)0.32 (12.8%)
6.7% ( -16% -   35%)
   MedPhrase2.93  (6.8%)3.12  (8.2%)
6.7% (  -7% -   23%)
 AndHighHigh5.46  (6.6%)5.86  (7.0%)
7.2% (  -5% -   22%)
 Respell   19.68  (5.9%)   21.15  (6.6%)
7.5% (  -4% -   21%)
   LowPhrase0.46  (9.5%)0.50 (10.2%)
7.6% ( -11% -   30%)
 Prefix35.25  (8.2%)5.66  (7.7%)
7.9% (  -7% -   25%)
HighSloppyPhrase1.54  (8.0%)1.67 (13.1%)
8.5% ( -11% -   32%)
 MedSpanNear5.25  (7.0%)5.72  (8.2%)
9.0% (  -5% -   25%)
Wildcard   12.44  (5.7%)   13.59  (6.5%)
9.2

[jira] [Closed] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2014-03-03 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer closed LUCENE-4332.
---

Resolution: Fixed

This was done a while back, I should setup something to actually publish the 
results out

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-07-01 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697035#comment-13697035
 ] 

Greg Bowyer commented on SOLR-4465:
---

{quote}
On task number two I'm wondering if we could use the standard post filter 
approach to inject new collectors. Then all we would need is a search component 
that handles the merge from the shards. This approach could be done with 
plugins so we wouldn't have to alter the core. The main work then would be a 
search component that would allow for pluggable merging algorithms. This could 
be useful in many contexts. We'd need to see how this component would fit in 
the distributed flow.
{quote}

Sounds resonable, although I am not quite sure what you mean by plugins in this 
context

 Configurable Collectors
 ---

 Key: SOLR-4465
 URL: https://issues.apache.org/jira/browse/SOLR-4465
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.1
Reporter: Joel Bernstein
 Fix For: 4.4

 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch


 This ticket provides a patch to add pluggable collectors to Solr. This patch 
 was generated and tested with Solr 4.1.
 This is how the patch functions:
 Collectors are plugged into Solr in the solconfig.xml using the new 
 collectorFactory element. For example:
 collectorFactory name=default class=solr.CollectorFactory/
 collectorFactory name=sum class=solr.SumCollectorFactory/
 The elements above define two collector factories. The first one is the 
 default collectorFactory. The class attribute points to 
 org.apache.solr.handler.component.CollectorFactory, which implements logic 
 that returns the default TopScoreDocCollector and TopFieldCollector. 
 To create your own collectorFactory you must subclass the default 
 CollectorFactory and at a minimum override the getCollector method to return 
 your new collector. 
 The parameter cl turns on pluggable collectors:
 cl=true
 If cl is not in the parameters, Solr will automatically use the default 
 collectorFactory.
 *Pluggable Doclist Sorting With the Docs Collector*
 You can specify two types of pluggable collectors. The first type is the docs 
 collector. For example:
 cl.docs=name
 The above param points to a named collectorFactory in the solrconfig.xml to 
 construct the collector. The docs collectorFactorys must return a collector 
 that extends the TopDocsCollector base class. Docs collectors are responsible 
 for collecting the doclist.
 You can specify only one docs collector per query.
 You can pass parameters to the docs collector using local params syntax. For 
 example:
 cl.docs=\{! sort=mycustomesort\}mycollector
 If cl=true and a docs collector is not specified, Solr will use the default 
 collectorFactory to create the docs collector.
 *Pluggable Custom Analytics With Delegating Collectors*
 You can also specify any number of custom analytic collectors with the 
 cl.analytic parameter. Analytic collectors are designed to collect 
 something else besides the doclist. Typically this would be some type of 
 custom analytic. For example:
 cl.analytic=sum
 The parameter above specifies a analytic collector named sum. Like the docs 
 collectors, sum points to a named collectorFactory in the solrconfig.xml. 
 You can specificy any number of analytic collectors by adding additional 
 cl.analytic parameters.
 Analytic collector factories must return Collector instances that extend 
 DelegatingCollector. 
 A sample analytic collector is provided in the patch through the 
 org.apache.solr.handler.component.SumCollectorFactory.
 This collectorFactory provides a very simple DelegatingCollector that groups 
 by a field and sums a column of floats. The sum collector is not designed to 
 be a fully functional sum function but to be a proof of concept for pluggable 
 analytics through delegating collectors.
 You can send parameters to analytic collectors with solr local param syntax.
 For example:
 cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
 The id parameter is mandatory for analytic collectors and is used to 
 identify the output from the collector. In this example the groupby and 
 column params tell the sum collector which field to group by and sum.
 Analytic collectors are passed a reference to the ResponseBuilder and can 
 place maps with analytic output directory into the SolrQueryResponse with the 
 add() method.
 Maps that are placed in the SolrQueryResponse are automatically added to the 
 outgoing response. The response will include a list named

[jira] [Created] (SOLR-4961) Add ability to provide custom ranking configurations

Greg Bowyer created SOLR-4961:
-

 Summary: Add ability to provide custom ranking configurations
 Key: SOLR-4961
 URL: https://issues.apache.org/jira/browse/SOLR-4961
 Project: Solr
  Issue Type: New Feature
Reporter: Greg Bowyer


This is a split from SOLR-4465, wherein the ability was added to make 
collectors configurable from within solr.

The aim is to hide the details of the custom collector work behind an API 
surface that is high level to the point of allowing end users to not see lucene 
specific details on a per request basis, but still providing the flexibility to 
configure things like collectors through configurations defined in the 
solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4961) Add ability to provide custom ranking configurations


 [ 
https://issues.apache.org/jira/browse/SOLR-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-4961:
--

Issue Type: Sub-task  (was: New Feature)
Parent: SOLR-4465

 Add ability to provide custom ranking configurations
 

 Key: SOLR-4961
 URL: https://issues.apache.org/jira/browse/SOLR-4961
 Project: Solr
  Issue Type: Sub-task
Reporter: Greg Bowyer

 This is a split from SOLR-4465, wherein the ability was added to make 
 collectors configurable from within solr.
 The aim is to hide the details of the custom collector work behind an API 
 surface that is high level to the point of allowing end users to not see 
 lucene specific details on a per request basis, but still providing the 
 flexibility to configure things like collectors through configurations 
 defined in the solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4962) Allow for analytic functions to be performed through altered collectors

Greg Bowyer created SOLR-4962:
-

 Summary: Allow for analytic functions to be performed through 
altered collectors
 Key: SOLR-4962
 URL: https://issues.apache.org/jira/browse/SOLR-4962
 Project: Solr
  Issue Type: Sub-task
Reporter: Greg Bowyer


This is a split from SOLR-4465, in that issue the ability to create customised 
collectors that allow for aggregate functions was born, but suffers from being 
unable to work well with queryResultCaching and grouping.

Migrating out this functionality into a collector component within solr, and 
perhaps pushing down some of the logic towards lucene seems to be the way to go.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4465) Configurable Collectors


[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693251#comment-13693251
 ] 

Greg Bowyer commented on SOLR-4465:
---

Life keeps getting in the way, I have crafted two sub-tasks on this, I would be 
interested in working through this with you.

 Configurable Collectors
 ---

 Key: SOLR-4465
 URL: https://issues.apache.org/jira/browse/SOLR-4465
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.1
Reporter: Joel Bernstein
 Fix For: 4.4

 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch


 This ticket provides a patch to add pluggable collectors to Solr. This patch 
 was generated and tested with Solr 4.1.
 This is how the patch functions:
 Collectors are plugged into Solr in the solconfig.xml using the new 
 collectorFactory element. For example:
 collectorFactory name=default class=solr.CollectorFactory/
 collectorFactory name=sum class=solr.SumCollectorFactory/
 The elements above define two collector factories. The first one is the 
 default collectorFactory. The class attribute points to 
 org.apache.solr.handler.component.CollectorFactory, which implements logic 
 that returns the default TopScoreDocCollector and TopFieldCollector. 
 To create your own collectorFactory you must subclass the default 
 CollectorFactory and at a minimum override the getCollector method to return 
 your new collector. 
 The parameter cl turns on pluggable collectors:
 cl=true
 If cl is not in the parameters, Solr will automatically use the default 
 collectorFactory.
 *Pluggable Doclist Sorting With the Docs Collector*
 You can specify two types of pluggable collectors. The first type is the docs 
 collector. For example:
 cl.docs=name
 The above param points to a named collectorFactory in the solrconfig.xml to 
 construct the collector. The docs collectorFactorys must return a collector 
 that extends the TopDocsCollector base class. Docs collectors are responsible 
 for collecting the doclist.
 You can specify only one docs collector per query.
 You can pass parameters to the docs collector using local params syntax. For 
 example:
 cl.docs=\{! sort=mycustomesort\}mycollector
 If cl=true and a docs collector is not specified, Solr will use the default 
 collectorFactory to create the docs collector.
 *Pluggable Custom Analytics With Delegating Collectors*
 You can also specify any number of custom analytic collectors with the 
 cl.analytic parameter. Analytic collectors are designed to collect 
 something else besides the doclist. Typically this would be some type of 
 custom analytic. For example:
 cl.analytic=sum
 The parameter above specifies a analytic collector named sum. Like the docs 
 collectors, sum points to a named collectorFactory in the solrconfig.xml. 
 You can specificy any number of analytic collectors by adding additional 
 cl.analytic parameters.
 Analytic collector factories must return Collector instances that extend 
 DelegatingCollector. 
 A sample analytic collector is provided in the patch through the 
 org.apache.solr.handler.component.SumCollectorFactory.
 This collectorFactory provides a very simple DelegatingCollector that groups 
 by a field and sums a column of floats. The sum collector is not designed to 
 be a fully functional sum function but to be a proof of concept for pluggable 
 analytics through delegating collectors.
 You can send parameters to analytic collectors with solr local param syntax.
 For example:
 cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
 The id parameter is mandatory for analytic collectors and is used to 
 identify the output from the collector. In this example the groupby and 
 column params tell the sum collector which field to group by and sum.
 Analytic collectors are passed a reference to the ResponseBuilder and can 
 place maps with analytic output directory into the SolrQueryResponse with the 
 add() method.
 Maps that are placed in the SolrQueryResponse are automatically added to the 
 outgoing response. The response will include a list named cl.analytic.id, 
 where id is specified in the local param.
 *Distributed Search*
 The CollectorFactory also has a method called merge(). This method aggregates 
 the results from each of the shards during distributed search. The default 
 CollectoryFactory implements the default merge logic for merging documents 
 from each shard. If you define a different docs collector you can override 
 the default merge method to merge documents in accordance with how

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-05-23 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665319#comment-13665319
 ] 

Greg Bowyer commented on SOLR-4465:
---

I agree with both of Yonik's points, I would be tempted to split this patch 
into two, and cover the first part here, with analytic / aggregation functions 
in a follow up patch.

Do you want any help with this patch? cutting it up, testing etc ?

 Configurable Collectors
 ---

 Key: SOLR-4465
 URL: https://issues.apache.org/jira/browse/SOLR-4465
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.1
Reporter: Joel Bernstein
 Fix For: 4.4

 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch


 This ticket provides a patch to add pluggable collectors to Solr. This patch 
 was generated and tested with Solr 4.1.
 This is how the patch functions:
 Collectors are plugged into Solr in the solconfig.xml using the new 
 collectorFactory element. For example:
 collectorFactory name=default class=solr.CollectorFactory/
 collectorFactory name=sum class=solr.SumCollectorFactory/
 The elements above define two collector factories. The first one is the 
 default collectorFactory. The class attribute points to 
 org.apache.solr.handler.component.CollectorFactory, which implements logic 
 that returns the default TopScoreDocCollector and TopFieldCollector. 
 To create your own collectorFactory you must subclass the default 
 CollectorFactory and at a minimum override the getCollector method to return 
 your new collector. 
 The parameter cl turns on pluggable collectors:
 cl=true
 If cl is not in the parameters, Solr will automatically use the default 
 collectorFactory.
 *Pluggable Doclist Sorting With the Docs Collector*
 You can specify two types of pluggable collectors. The first type is the docs 
 collector. For example:
 cl.docs=name
 The above param points to a named collectorFactory in the solrconfig.xml to 
 construct the collector. The docs collectorFactorys must return a collector 
 that extends the TopDocsCollector base class. Docs collectors are responsible 
 for collecting the doclist.
 You can specify only one docs collector per query.
 You can pass parameters to the docs collector using local params syntax. For 
 example:
 cl.docs=\{! sort=mycustomesort\}mycollector
 If cl=true and a docs collector is not specified, Solr will use the default 
 collectorFactory to create the docs collector.
 *Pluggable Custom Analytics With Delegating Collectors*
 You can also specify any number of custom analytic collectors with the 
 cl.analytic parameter. Analytic collectors are designed to collect 
 something else besides the doclist. Typically this would be some type of 
 custom analytic. For example:
 cl.analytic=sum
 The parameter above specifies a analytic collector named sum. Like the docs 
 collectors, sum points to a named collectorFactory in the solrconfig.xml. 
 You can specificy any number of analytic collectors by adding additional 
 cl.analytic parameters.
 Analytic collector factories must return Collector instances that extend 
 DelegatingCollector. 
 A sample analytic collector is provided in the patch through the 
 org.apache.solr.handler.component.SumCollectorFactory.
 This collectorFactory provides a very simple DelegatingCollector that groups 
 by a field and sums a column of floats. The sum collector is not designed to 
 be a fully functional sum function but to be a proof of concept for pluggable 
 analytics through delegating collectors.
 You can send parameters to analytic collectors with solr local param syntax.
 For example:
 cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
 The id parameter is mandatory for analytic collectors and is used to 
 identify the output from the collector. In this example the groupby and 
 column params tell the sum collector which field to group by and sum.
 Analytic collectors are passed a reference to the ResponseBuilder and can 
 place maps with analytic output directory into the SolrQueryResponse with the 
 add() method.
 Maps that are placed in the SolrQueryResponse are automatically added to the 
 outgoing response. The response will include a list named cl.analytic.id, 
 where id is specified in the local param.
 *Distributed Search*
 The CollectorFactory also has a method called merge(). This method aggregates 
 the results from each of the shards during distributed search. The default 
 CollectoryFactory implements the default merge logic for merging documents 
 from each shard. If you define

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-05-22 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664722#comment-13664722
 ] 

Greg Bowyer commented on SOLR-4465:
---

Is there anything remaining for this ? 

 Configurable Collectors
 ---

 Key: SOLR-4465
 URL: https://issues.apache.org/jira/browse/SOLR-4465
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.1
Reporter: Joel Bernstein
 Fix For: 4.4

 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch


 This ticket provides a patch to add pluggable collectors to Solr. This patch 
 was generated and tested with Solr 4.1.
 This is how the patch functions:
 Collectors are plugged into Solr in the solconfig.xml using the new 
 collectorFactory element. For example:
 collectorFactory name=default class=solr.CollectorFactory/
 collectorFactory name=sum class=solr.SumCollectorFactory/
 The elements above define two collector factories. The first one is the 
 default collectorFactory. The class attribute points to 
 org.apache.solr.handler.component.CollectorFactory, which implements logic 
 that returns the default TopScoreDocCollector and TopFieldCollector. 
 To create your own collectorFactory you must subclass the default 
 CollectorFactory and at a minimum override the getCollector method to return 
 your new collector. 
 The parameter cl turns on pluggable collectors:
 cl=true
 If cl is not in the parameters, Solr will automatically use the default 
 collectorFactory.
 *Pluggable Doclist Sorting With the Docs Collector*
 You can specify two types of pluggable collectors. The first type is the docs 
 collector. For example:
 cl.docs=name
 The above param points to a named collectorFactory in the solrconfig.xml to 
 construct the collector. The docs collectorFactorys must return a collector 
 that extends the TopDocsCollector base class. Docs collectors are responsible 
 for collecting the doclist.
 You can specify only one docs collector per query.
 You can pass parameters to the docs collector using local params syntax. For 
 example:
 cl.docs=\{! sort=mycustomesort\}mycollector
 If cl=true and a docs collector is not specified, Solr will use the default 
 collectorFactory to create the docs collector.
 *Pluggable Custom Analytics With Delegating Collectors*
 You can also specify any number of custom analytic collectors with the 
 cl.analytic parameter. Analytic collectors are designed to collect 
 something else besides the doclist. Typically this would be some type of 
 custom analytic. For example:
 cl.analytic=sum
 The parameter above specifies a analytic collector named sum. Like the docs 
 collectors, sum points to a named collectorFactory in the solrconfig.xml. 
 You can specificy any number of analytic collectors by adding additional 
 cl.analytic parameters.
 Analytic collector factories must return Collector instances that extend 
 DelegatingCollector. 
 A sample analytic collector is provided in the patch through the 
 org.apache.solr.handler.component.SumCollectorFactory.
 This collectorFactory provides a very simple DelegatingCollector that groups 
 by a field and sums a column of floats. The sum collector is not designed to 
 be a fully functional sum function but to be a proof of concept for pluggable 
 analytics through delegating collectors.
 You can send parameters to analytic collectors with solr local param syntax.
 For example:
 cl.analytic=\{! id=1 groupby=field1 column=field2\}sum
 The id parameter is mandatory for analytic collectors and is used to 
 identify the output from the collector. In this example the groupby and 
 column params tell the sum collector which field to group by and sum.
 Analytic collectors are passed a reference to the ResponseBuilder and can 
 place maps with analytic output directory into the SolrQueryResponse with the 
 add() method.
 Maps that are placed in the SolrQueryResponse are automatically added to the 
 outgoing response. The response will include a list named cl.analytic.id, 
 where id is specified in the local param.
 *Distributed Search*
 The CollectorFactory also has a method called merge(). This method aggregates 
 the results from each of the shards during distributed search. The default 
 CollectoryFactory implements the default merge logic for merging documents 
 from each shard. If you define a different docs collector you can override 
 the default merge method to merge documents in accordance with how they are 
 collected at the shard level.
 With analytic collectors, you'll need to override

[jira] [Closed] (LUCENE-5010) Getting a Poterstemmer error in Solr

2013-05-22 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer closed LUCENE-5010.
---

Resolution: Won't Fix

You are right it is the same bug that was reported in the past, and your 
specific JVM version is one previous to Java 7 update 4, which is the version 
where all of the jit bugs were fixed

I would use 1.7.0u4 or later.

A 1.7.0u4 should give you a version string as follows

java version 1.7.0_04
Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)

 Getting a Poterstemmer error in Solr
 

 Key: LUCENE-5010
 URL: https://issues.apache.org/jira/browse/LUCENE-5010
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis, modules/spellchecker
Affects Versions: 3.6.1
 Environment: Windows 7 64bit
Reporter: Mark Streitman

 Java version 1.7.0
 Java(TM) SE Runtime Environment (build 1.7.0-b147)
 Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
 is dying from an error in porterstemmer.
 This is just like the error listed from 2011
 https://issues.apache.org/jira/browse/LUCENE-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070153#comment-13070153
 Below is the log file that is generated
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x02b08ce1, 
 pid=3208, tid=4688
 #
 # JRE version: 7.0-b147
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode 
 windows-amd64 compressed oops)
 # Problematic frame:
 # J  org.apache.lucene.analysis.PorterStemmer.stem(I)Z
 #
 # Failed to write core dump. Minidumps are not enabled by default on client 
 versions of Windows
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 #
 ---  T H R E A D  ---
 Current thread (0x0f57f000):  JavaThread 
 http-localhost-127.0.0.1-8080-5 daemon [_thread_in_Java, id=4688, 
 stack(0x0b25,0x0b35)]
 siginfo: ExceptionCode=0xc005, reading address 0x0002fd8be046
 Registers:
 RAX=0x0001, RBX=0x, RCX=0xfd8be038, 
 RDX=0x0065
 RSP=0x0b34e950, RBP=0x, RSI=0x, 
 RDI=0x0003
 R8 =0xfd8be010, R9 =0xfffe, R10=0x0065, 
 R11=0x0032
 R12=0x, R13=0x02b08c88, R14=0x0003, 
 R15=0x0f57f000
 RIP=0x02b08ce1, EFLAGS=0x00010286
 Top of Stack: (sp=0x0b34e950)
 0x0b34e950:   00650072 
 0x0b34e960:   fd8be010 0001e053f560
 0x0b34e970:    fd8bdbb0
 0x0b34e980:   0b34ea08 026660d8
 0x0b34e990:   fd8bdfe0 026660d8
 0x0b34e9a0:   0b34ea08 026663d0
 0x0b34e9b0:   026663d0 
 0x0b34e9c0:   fd8be010 0b34e9c8
 0x0b34e9d0:   d2e598d2 0b34ea30
 0x0b34e9e0:   d2e5a290 d3fe7bc8
 0x0b34e9f0:   d2e59918 0b34e9b8
 0x0b34ea00:   0b34ea40 fd8be010
 0x0b34ea10:   02a502c4 0004
 0x0b34ea20:    fd8bdbb0
 0x0b34ea30:   fd8be010 02a502c4
 0x0b34ea40:   f5d70090 fd8bdf30 
 Instructions: (pc=0x02b08ce1)
 0x02b08cc1:   41 83 c1 fb 4c 63 f7 42 0f b7 5c 71 10 89 5c 24
 0x02b08cd1:   04 8b c7 83 c0 fe 0f b7 54 41 10 8b df 83 c3 fc
 0x02b08ce1:   40 0f b7 6c 59 10 83 c7 fd 44 0f b7 6c 79 10 41
 0x02b08cf1:   83 fa 69 0f 84 07 03 00 00 49 b8 d8 31 4e e0 00 
 Register to memory mapping:
 RAX=0x0001 is an unknown value
 RBX=0x is an unallocated location in the heap
 RCX=0xfd8be038 is an oop
 [C 
  - klass: {type array char}
  - length: 50
 RDX=0x0065 is an unknown value
 RSP=0x0b34e950 is pointing into the stack for thread: 
 0x0f57f000
 RBP=0x is an unknown value
 RSI=0x is an unknown value
 RDI=0x0003 is an unknown value
 R8 =0xfd8be010 is an oop
 org.apache.lucene.analysis.PorterStemmer 
  - klass: 'org/apache/lucene/analysis/PorterStemmer'
 R9 =0xfffe is an unallocated location in the heap
 R10=0x0065 is an unknown value
 R11=0x0032 is an unknown value
 R12=0x is an unknown value
 R13=0x02b07c10 [CodeBlob (0x02b07c10)]
 Framesize: 12
 R14=0x0003

[jira] [Updated] (SOLR-4833) All(most all) Logger instances should be made static

2013-05-22 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-4833:
--

Attachment: SOLR-4833-Remove-None-static-loggers.patch

I took a first stab at this, two things strike me.

* Should lucene use SLF4J as a logging framework instead of j.u.logging ?

* Should loggers that are defined on parent classes be removed, I feel that 
*if* we want subclasses to go through the same logger, they can all access that 
logger by its moniker themselves, the only thing that stopped me in my tracks 
is that some of the exposed loggers form part of the user extensible API 
surface. 

 All(most all) Logger instances should be made static
 

 Key: SOLR-4833
 URL: https://issues.apache.org/jira/browse/SOLR-4833
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-4833-Remove-None-static-loggers.patch


 The majority of Logger usage in Solr is via static variables, but there are a 
 few places where this pattern does not hold true - i think we should fix that 
 and be completley consistent.  if there is any specific cases where a 
 non-static variable really makes a lot of sense, then it should be heavily 
 commented as to why.
 
 The SLF4J FAQ has a list of pros and cons for why Logger variables 
 should/shouldn't be static...
 http://slf4j.org/faq.html#declared_static
 ...the majority of the pros for non-static usage don't really apply to 
 Solr, while the pros for static usage due.
 Another lucene/solr specific pro in favor of static variables for loggers is 
 the way our test framework looks for memory leaks in tests.  Having a simple 
 test that does not null out a static reference to what seems like a small 
 object is typically fine -- but if that small object has an explicit 
 (non-static) reference to a Logger, all of the state in that Logger is 
 counted as part of the size of that small object leading to confusion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4819) Pimp QueryEqualityTest to use random testing

2013-05-13 Thread Greg Bowyer (JIRA)

Greg Bowyer created SOLR-4819:
-

 Summary: Pimp QueryEqualityTest to use random testing
 Key: SOLR-4819
 URL: https://issues.apache.org/jira/browse/SOLR-4819
 Project: Solr
  Issue Type: Improvement
Reporter: Greg Bowyer
Priority: Minor


The current QueryEqualityTest does some (important but) basic tests of query 
parsing to ensure that queries that are produced are equivalent to each other.

Since we do random testing, it might be a good idea to generate random queries 
rather than pre-canned ones

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4819) Pimp QueryEqualityTest to use random testing

2013-05-13 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656481#comment-13656481
 ] 

Greg Bowyer commented on SOLR-4819:
---

bq. I think having specific test classes for specific types of queries (or for 
specific qparsers) would be the best place for more randomized testing ... this 
class is really just the front line defense against something really terrible.

That makes sense, I guess I didn't quite understand the purpose of this class

 Pimp QueryEqualityTest to use random testing
 

 Key: SOLR-4819
 URL: https://issues.apache.org/jira/browse/SOLR-4819
 Project: Solr
  Issue Type: Improvement
Reporter: Greg Bowyer
Priority: Minor

 The current QueryEqualityTest does some (important but) basic tests of query 
 parsing to ensure that queries that are produced are equivalent to each other.
 Since we do random testing, it might be a good idea to generate random 
 queries rather than pre-canned ones

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch

Version that has a basic (but hopefully working) cache implementation.

PostFilters are still a bit of an unknown, since these are needed for spacial I 
will look at how they can be supported

 Make solr use lucene filters directly
 -

 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch


 Presently solr uses bitsets, queries and collectors to implement the concept 
 of filters. This has proven to be very powerful, but does come at the cost of 
 introducing a large body of code into solr making it harder to optimise and 
 maintain.
 Another issue here is that filters currently cache sub-optimally given the 
 changes in lucene towards atomic readers.
 Rather than patch these issues, this is an attempt to rework the filters in 
 solr to leverage the Filter subsystem from lucene as much as possible.
 In good time the aim is to get this to do the following:
 ∘ Handle setting up filter implementations that are able to correctly cache 
 with reference to the AtomicReader that they are caching for rather that for 
 the entire index at large
 ∘ Get the post filters working, I am thinking that this can be done via 
 lucenes chained filter, with the ‟expensive” filters being put towards the 
 end of the chain - this has different semantics internally to the original 
 implementation but IMHO should have the same result for end users
 ∘ Learn how to create filters that are potentially more efficient, at present 
 solr basically runs a simple query that gathers a DocSet that relates to the 
 documents that we want filtered; it would be interesting to make use of 
 filter implementations that are in theory faster than query filters (for 
 instance there are filters that are able to query the FieldCache)
 ∘ Learn how to decompose filters so that a complex filter query can be cached 
 (potentially) as its constituent parts; for example the filter below 
 currently needs love, care and feeding to ensure that the filter cache is not 
 unduly stressed
 {code}
   'category:(100) OR category:(200) OR category:(300)'
 {code}
 Really there is no reason not to express this in a cached form as 
 {code}
 BooleanFilter(
 FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
   )
 {code}
 This would yield better cache usage I think as we can reuse docsets across 
 multiple queries, as well as avoid issues when filters are presented in 
 differing orders
 ∘ Instead of end users providing costing we might (and this is a big might 
 FWIW), be able to create a sort of execution plan of filters, leveraging a 
 combination of what the index is able to tell us as well as sampling and 
 ‟educated guesswork”; in essence this is what some DBMS software, for example 
 postgresql does - it has a genetic algo that attempts to solve the travelling 
 salesman - to great effect
 ∘ I am sure I will probably come up with other ambitious ideas to plug in 
 here . :S 
 Patches obviously forthcoming but the bulk of the work can be followed here 
 https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Fix Version/s: 5.0

 Make solr use lucene filters directly
 -

 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Fix For: 5.0

 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch


 Presently solr uses bitsets, queries and collectors to implement the concept 
 of filters. This has proven to be very powerful, but does come at the cost of 
 introducing a large body of code into solr making it harder to optimise and 
 maintain.
 Another issue here is that filters currently cache sub-optimally given the 
 changes in lucene towards atomic readers.
 Rather than patch these issues, this is an attempt to rework the filters in 
 solr to leverage the Filter subsystem from lucene as much as possible.
 In good time the aim is to get this to do the following:
 ∘ Handle setting up filter implementations that are able to correctly cache 
 with reference to the AtomicReader that they are caching for rather that for 
 the entire index at large
 ∘ Get the post filters working, I am thinking that this can be done via 
 lucenes chained filter, with the ‟expensive” filters being put towards the 
 end of the chain - this has different semantics internally to the original 
 implementation but IMHO should have the same result for end users
 ∘ Learn how to create filters that are potentially more efficient, at present 
 solr basically runs a simple query that gathers a DocSet that relates to the 
 documents that we want filtered; it would be interesting to make use of 
 filter implementations that are in theory faster than query filters (for 
 instance there are filters that are able to query the FieldCache)
 ∘ Learn how to decompose filters so that a complex filter query can be cached 
 (potentially) as its constituent parts; for example the filter below 
 currently needs love, care and feeding to ensure that the filter cache is not 
 unduly stressed
 {code}
   'category:(100) OR category:(200) OR category:(300)'
 {code}
 Really there is no reason not to express this in a cached form as 
 {code}
 BooleanFilter(
 FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
   )
 {code}
 This would yield better cache usage I think as we can reuse docsets across 
 multiple queries, as well as avoid issues when filters are presented in 
 differing orders
 ∘ Instead of end users providing costing we might (and this is a big might 
 FWIW), be able to create a sort of execution plan of filters, leveraging a 
 combination of what the index is able to tell us as well as sampling and 
 ‟educated guesswork”; in essence this is what some DBMS software, for example 
 postgresql does - it has a genetic algo that attempts to solve the travelling 
 salesman - to great effect
 ∘ I am sure I will probably come up with other ambitious ideas to plug in 
 here . :S 
 Patches obviously forthcoming but the bulk of the work can be followed here 
 https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4785) New MaxScoreQParserPlugin


 [ 
https://issues.apache.org/jira/browse/SOLR-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-4785:
--

Attachment: SOLR-4785-Add-tests-for-maxscore-to-QueryEqualityTest.patch

This bit me while I was updating my filter patch (SOLR-3763)

I had a stab at putting some basic equality tests in place, but looking at the 
test case itself I wonder if QueryEqualityTest should be re-worked with the 
full fury of randomised testing, as it seems to be at best, only testing the 
happy cases.

 New MaxScoreQParserPlugin
 -

 Key: SOLR-4785
 URL: https://issues.apache.org/jira/browse/SOLR-4785
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: 
 SOLR-4785-Add-tests-for-maxscore-to-QueryEqualityTest.patch, SOLR-4785.patch, 
 SOLR-4785.patch


 A customer wants to contribute back this component.
 It is a QParser which behaves exactly like lucene parser (extends it), but 
 returns the Max score from the clauses, i.e. max(c1,c2,c3..) instead of the 
 default which is sum(c1,c2,c3...). It does this by wrapping all SHOULD 
 clauses in a DisjunctionMaxQuery with tie=1.0. Any MUST or PROHIBITED clauses 
 are passed through as-is. Non-boolean queries, e.g. NumericRange 
 falls-through to lucene parser.
 To use, add to solrconfig.xml:
 {code:xml}
   queryParser name=maxscore class=solr.MaxScoreQParserPlugin/
 {code}
 Then use it in a query
 {noformat}
 q=A AND B AND {!maxscore v=$max}max=C OR (D AND E)
 {noformat}
 This will return the score of A+B+max(C,sum(D+E))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch

 Trunk moves really quickly these days (or I move slowly)

Updated patch to cope with recent trunk changes

 Make solr use lucene filters directly
 -

 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Fix For: 5.0

 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch


 Presently solr uses bitsets, queries and collectors to implement the concept 
 of filters. This has proven to be very powerful, but does come at the cost of 
 introducing a large body of code into solr making it harder to optimise and 
 maintain.
 Another issue here is that filters currently cache sub-optimally given the 
 changes in lucene towards atomic readers.
 Rather than patch these issues, this is an attempt to rework the filters in 
 solr to leverage the Filter subsystem from lucene as much as possible.
 In good time the aim is to get this to do the following:
 ∘ Handle setting up filter implementations that are able to correctly cache 
 with reference to the AtomicReader that they are caching for rather that for 
 the entire index at large
 ∘ Get the post filters working, I am thinking that this can be done via 
 lucenes chained filter, with the ‟expensive” filters being put towards the 
 end of the chain - this has different semantics internally to the original 
 implementation but IMHO should have the same result for end users
 ∘ Learn how to create filters that are potentially more efficient, at present 
 solr basically runs a simple query that gathers a DocSet that relates to the 
 documents that we want filtered; it would be interesting to make use of 
 filter implementations that are in theory faster than query filters (for 
 instance there are filters that are able to query the FieldCache)
 ∘ Learn how to decompose filters so that a complex filter query can be cached 
 (potentially) as its constituent parts; for example the filter below 
 currently needs love, care and feeding to ensure that the filter cache is not 
 unduly stressed
 {code}
   'category:(100) OR category:(200) OR category:(300)'
 {code}
 Really there is no reason not to express this in a cached form as 
 {code}
 BooleanFilter(
 FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
   )
 {code}
 This would yield better cache usage I think as we can reuse docsets across 
 multiple queries, as well as avoid issues when filters are presented in 
 differing orders
 ∘ Instead of end users providing costing we might (and this is a big might 
 FWIW), be able to create a sort of execution plan of filters, leveraging a 
 combination of what the index is able to tell us as well as sampling and 
 ‟educated guesswork”; in essence this is what some DBMS software, for example 
 postgresql does - it has a genetic algo that attempts to solve the travelling 
 salesman - to great effect
 ∘ I am sure I will probably come up with other ambitious ideas to plug in 
 here . :S 
 Patches obviously forthcoming but the bulk of the work can be followed here 
 https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1560) maxDocBytesToAnalyze should be required arg up front


 [ 
https://issues.apache.org/jira/browse/LUCENE-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-1560:


Labels: dead  (was: )

 maxDocBytesToAnalyze should be required arg up front
 

 Key: LUCENE-1560
 URL: https://issues.apache.org/jira/browse/LUCENE-1560
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 2.4.1
Reporter: Michael McCandless
  Labels: dead
 Fix For: 4.4


 We recently changed IndexWriter to require you to specify up-front
 MaxFieldLength, on creation, so that you are aware of this dangerous
 loses stuff setting.  Too many developers had fallen into the trap
 of how come my search can't find this document
 I think we should do the same with maxDocBytesToAnalyze with
 highlighter?
 Spinoff from this thread:
 
 http://www.nabble.com/Lucene-Highlighting-and-Dynamic-Summaries-p22385887.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS


 [ 
https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-1743:


Labels: dead  (was: )

 MMapDirectory should only mmap large files, small files should be opened 
 using SimpleFS/NIOFS
 -

 Key: LUCENE-1743
 URL: https://issues.apache.org/jira/browse/LUCENE-1743
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
  Labels: dead
 Fix For: 4.4


 This is a followup to LUCENE-1741:
 Javadocs state (in FileChannel#map): For most operating systems, mapping a 
 file into memory is more expensive than reading or writing a few tens of 
 kilobytes of data via the usual read and write methods. From the standpoint 
 of performance it is generally only worth mapping relatively large files into 
 memory.
 MMapDirectory should get a user-configureable size parameter that is a lower 
 limit for mmapping files. All files with a sizelimit should be opened using 
 a conventional IndexInput from SimpleFS or NIO (another configuration option 
 for the fallback?).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2713) TestPhraseQuery.testRandomPhrases takes minutes to run with SimpleText


 [ 
https://issues.apache.org/jira/browse/LUCENE-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer resolved LUCENE-2713.
-

   Resolution: Fixed
Fix Version/s: (was: 4.4)
   5.0

I beat on this test case a few times choosing all the codecs, and I could not 
repeat the slowdown, I am thinking that both the ThreadLeaks and performance 
issues have long been fixed.

I am removing the fixed seed and closing this bug down, hopefully to never see 
it again.

 TestPhraseQuery.testRandomPhrases takes minutes to run with SimpleText
 --

 Key: LUCENE-2713
 URL: https://issues.apache.org/jira/browse/LUCENE-2713
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/test
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
  Labels: dead
 Fix For: 5.0


 This test takes a few minutes to run if it gets simpletext codec.
 On hudson, it took 15 minutes!
 I added an assumeFalse(simpleText) as a temporary workaround, but we should 
 see if there is 
 somethign we can improve so we can remove this hack.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (LUCENE-1890) auto-warming from Apache Solr causes NULL Pointer


 [ 
https://issues.apache.org/jira/browse/LUCENE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer closed LUCENE-1890.
---

Resolution: Cannot Reproduce
  Assignee: Greg Bowyer

I am going to be bold and make the assumption that, since spatial has been 
re-worked and Lucene has gone from 2.x - 4.x this issue is no longer present.

 auto-warming from Apache Solr causes NULL Pointer
 -

 Key: LUCENE-1890
 URL: https://issues.apache.org/jira/browse/LUCENE-1890
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 2.4.1
 Environment: Linux
Reporter: Bill Bell
Assignee: Greg Bowyer
  Labels: dead
 Fix For: 4.4

 Attachments: localsolr.jar, lucene-spatial-2.9-dev.jar


 Sep 6, 2009 12:48:07 PM org.apache.solr.common.SolrException log
 SEVERE: Error during auto-warming of 
 key:org.apache.solr.search.QueryResultKey@b00371eb:java.lang.NullPointerException
 at 
 org.apache.lucene.spatial.tier.DistanceFieldComparatorSource$DistanceScoreDocLookupComparator.copy(DistanceFieldComparatorSource.java:101)
 at 
 org.apache.lucene.search.TopFieldCollector$MultiComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:554)
 at 
 org.apache.solr.search.DocSetDelegateCollector.collect(DocSetHitCollector.java:98)
 at 
 org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:281)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:253)
 at org.apache.lucene.search.Searcher.search(Searcher.java:171)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1088)
 at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:876)
 at 
 org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:53)
 at 
 org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:328)
 at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
 at 
 org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1468)
 at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1142)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2540) Document. add get(i) and addAll to make interacting with fieldables and documents easier/faster and more readable


[ 
https://issues.apache.org/jira/browse/LUCENE-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655743#comment-13655743
 ] 

Greg Bowyer commented on LUCENE-2540:
-

Outside of batch adding fields it looks like this issue is somewhat dead since 
we can now address the field(s) by name, and have sensible iterators on them?

Anyone opposed to closing this ?

 Document. add get(i) and addAll to make interacting with fieldables and 
 documents easier/faster and more readable
 -

 Key: LUCENE-2540
 URL: https://issues.apache.org/jira/browse/LUCENE-2540
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Affects Versions: 3.0.2
Reporter: Woody Anderson
  Labels: dead
 Fix For: 4.4

 Attachments: LUCENE-2540.patch


 Working with Document Fieldables is often a pain.
 getting the ith involves chained method calls and is not very readable:
 {code}
 // nice
 doc.getFieldable(i);
 // not nice
 doc.getFields().get(i);
 {code}
 also, when combining documents, or otherwise aggregating multiple fields into 
 a single document,
 {code}
 // nice
 doc.addAll(fieldables);
 // note nice: less readable and more error prone
 ListFieldable fields = ...;
 for (Fieldable field : fields) {
   result.add(field);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3917) Port pruning module to trunk apis

2013-05-06 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3917:


Attachment: LUCENE-3917-Initial-port-of-index-pruning.patch

Recently at $DAYJOB the horror that is high frequency terms in OR search came 
to bite us, as a result I have an interest in pruning again.

As such I made an attempt to forward port the existing pruning package directly 
to Lucene 4.0.

This is largely a mechanical port, I have not put any real thought into it so 
its probably terrible.

This does not pass its unit test, and is a mess internally in the code, I am 
going to try to get the unit test working and then loop back on making the code 
more lucene 4.x friendly.

One question that occurs from this is how AtomicReaders are handled, do we want 
to pruning per segment with global stats, prune based on segment stats or just 
do the terrible thing and work with a SlowCompositeReader.

I also think, given the work that went on with LUCENE-4752 it might be possible 
to do the pruning in a similar fashion to the sorting merge such that we do a 
pruning merge.

 Port pruning module to trunk apis
 -

 Key: LUCENE-3917
 URL: https://issues.apache.org/jira/browse/LUCENE-3917
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/other
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
 Fix For: 4.3

 Attachments: LUCENE-3917-Initial-port-of-index-pruning.patch


 Pruning module was added in LUCENE-1812, but we need to port
 this to trunk (4.0)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly

2013-05-03 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Description: 
Presently solr uses bitsets, queries and collectors to implement the concept of 
filters. This has proven to be very powerful, but does come at the cost of 
introducing a large body of code into solr making it harder to optimise and 
maintain.

Another issue here is that filters currently cache sub-optimally given the 
changes in lucene towards atomic readers.

Rather than patch these issues, this is an attempt to rework the filters in 
solr to leverage the Filter subsystem from lucene as much as possible.

In good time the aim is to get this to do the following:

∘ Handle setting up filter implementations that are able to correctly cache 
with reference to the AtomicReader that they are caching for rather that for 
the entire index at large

∘ Get the post filters working, I am thinking that this can be done via lucenes 
chained filter, with the ‟expensive” filters being put towards the end of the 
chain - this has different semantics internally to the original implementation 
but IMHO should have the same result for end users

∘ Learn how to create filters that are potentially more efficient, at present 
solr basically runs a simple query that gathers a DocSet that relates to the 
documents that we want filtered; it would be interesting to make use of filter 
implementations that are in theory faster than query filters (for instance 
there are filters that are able to query the FieldCache)

∘ Learn how to decompose filters so that a complex filter query can be cached 
(potentially) as its constituent parts; for example the filter below currently 
needs love, care and feeding to ensure that the filter cache is not unduly 
stressed

{code}
  'category:(100) OR category:(200) OR category:(300)'
{code}

Really there is no reason not to express this in a cached form as 

{code}
BooleanFilter(
FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
  )
{code}

This would yield better cache usage I think as we can reuse docsets across 
multiple queries, as well as avoid issues when filters are presented in 
differing orders

∘ Instead of end users providing costing we might (and this is a big might 
FWIW), be able to create a sort of execution plan of filters, leveraging a 
combination of what the index is able to tell us as well as sampling and 
‟educated guesswork”; in essence this is what some DBMS software, for example 
postgresql does - it has a genetic algo that attempts to solve the travelling 
salesman - to great effect

∘ I am sure I will probably come up with other ambitious ideas to plug in here 
. :S 

Patches obviously forthcoming but the bulk of the work can be followed here 
https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

  was:
Presently solr uses bitsets, queries and collectors to implement the concept of 
filters. This has proven to be very powerful, but does come at the cost of 
introducing a large body of code into solr making it harder to optimise and 
maintain.

Another issue here is that filters currently cache sub-optimally given the 
changes in lucene towards atomic readers.

Rather than patch these issues, this is an attempt to rework the filters in 
solr to leverage the Filter subsystem from lucene as much as possible.

In good time the aim is to get this to do the following:

∘ Handle setting up filter implementations that are able to correctly cache 
with reference to the AtomicReader that they are caching for rather that for 
the entire index at large

∘ Get the post filters working, I am thinking that this can be done via lucenes 
chained filter, with the ‟expensive” filters being put towards the end of the 
chain - this has different semantics internally to the original implementation 
but IMHO should have the same result for end users

∘ Learn how to create filters that are potentially more efficient, at present 
solr basically runs a simple query that gathers a DocSet that relates to the 
documents that we want filtered; it would be interesting to make use of filter 
implementations that are in theory faster than query filters (for instance 
there are filters that are able to query the FieldCache)

∘ Learn how to decompose filters so that a complex filter query can be cached 
(potentially) as its constituent parts; for example the filter below currently 
needs love, care and feeding to ensure that the filter cache is not unduly 
stressed

{code}
  'category:(100) OR category:(200) OR category:(300)'
{code}

Really there is no reason not to express this in a cached form as 

{code}
BooleanFilter(
FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD

[jira] [Updated] (SOLR-4616) HitRatio in mbean is of type String instead should be float/double.


 [ 
https://issues.apache.org/jira/browse/SOLR-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-4616:
--

Attachment: SOLR-4616-Make-HitRatio-in-cache-mbeans-a-float.patch

This should fix it, I am going to test it out shortly (unit tests and such 
pass, just need to fire up a solr instance)



 HitRatio in mbean is of type String instead should be float/double.
 ---

 Key: SOLR-4616
 URL: https://issues.apache.org/jira/browse/SOLR-4616
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.2
 Environment: Solr 4.2 on JBoss7.1.1
Reporter: Aditya
Assignee: Greg Bowyer
Priority: Minor
 Attachments: SOLR-4616-Make-HitRatio-in-cache-mbeans-a-float.patch


 While using our existing System Monitoring tool with solr using JMX we 
 noticed that the stats values for Cache is not consistence w.r.t data type. 
 decimal values are returned as string instead should be of type float/double.
 e.g hitratio 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-4156) JMX numDocs and maxDoc are of type string


 [ 
https://issues.apache.org/jira/browse/SOLR-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer reassigned SOLR-4156:
-

Assignee: Greg Bowyer

 JMX numDocs and maxDoc are of type string
 -

 Key: SOLR-4156
 URL: https://issues.apache.org/jira/browse/SOLR-4156
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Greg Harris
Assignee: Greg Bowyer
Priority: Minor
 Fix For: 4.3


 Some monitoring tools are sensitive to the object types that the JMX link 
 provides if not all. numDocs and maxDoc of the searcher MBean: 
 solr/collection1:type=searcher,id=org.apache.solr.search.SolrIndexSearcher
 are provided as String's not ints. Int would allow monitoring tools to 
 monitor them correctly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4156) JMX numDocs and maxDoc are of type string


[ 
https://issues.apache.org/jira/browse/SOLR-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647671#comment-13647671
 ] 

Greg Bowyer commented on SOLR-4156:
---

I don't think this is the case, I have just looked at this and it seems that 
the types are integer.

numdocs claims this type: 
javax.management.openmbean.SimpleType(name=java.lang.Integer)

maxdocs claims this type:
javax.management.openmbean.SimpleType(name=java.lang.Integer)

 JMX numDocs and maxDoc are of type string
 -

 Key: SOLR-4156
 URL: https://issues.apache.org/jira/browse/SOLR-4156
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Greg Harris
Assignee: Greg Bowyer
Priority: Minor
 Fix For: 4.3


 Some monitoring tools are sensitive to the object types that the JMX link 
 provides if not all. numDocs and maxDoc of the searcher MBean: 
 solr/collection1:type=searcher,id=org.apache.solr.search.SolrIndexSearcher
 are provided as String's not ints. Int would allow monitoring tools to 
 monitor them correctly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4616) HitRatio in mbean is of type String instead should be float/double.


 [ 
https://issues.apache.org/jira/browse/SOLR-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer resolved SOLR-4616.
---

Resolution: Fixed

 HitRatio in mbean is of type String instead should be float/double.
 ---

 Key: SOLR-4616
 URL: https://issues.apache.org/jira/browse/SOLR-4616
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.2
 Environment: Solr 4.2 on JBoss7.1.1
Reporter: Aditya
Assignee: Greg Bowyer
Priority: Minor
 Attachments: SOLR-4616-Make-HitRatio-in-cache-mbeans-a-float.patch


 While using our existing System Monitoring tool with solr using JMX we 
 noticed that the stats values for Cache is not consistence w.r.t data type. 
 decimal values are returned as string instead should be of type float/double.
 e.g hitratio 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch

Updated to latest trunk, the cache unit test still fails as does the spatial 
lat/lon tests

 Make solr use lucene filters directly
 -

 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch, 
 SOLR-3763-Make-solr-use-lucene-filters-directly.patch


 Presently solr uses bitsets, queries and collectors to implement the concept 
 of filters. This has proven to be very powerful, but does come at the cost of 
 introducing a large body of code into solr making it harder to optimise and 
 maintain.
 Another issue here is that filters currently cache sub-optimally given the 
 changes in lucene towards atomic readers.
 Rather than patch these issues, this is an attempt to rework the filters in 
 solr to leverage the Filter subsystem from lucene as much as possible.
 In good time the aim is to get this to do the following:
 ∘ Handle setting up filter implementations that are able to correctly cache 
 with reference to the AtomicReader that they are caching for rather that for 
 the entire index at large
 ∘ Get the post filters working, I am thinking that this can be done via 
 lucenes chained filter, with the ‟expensive” filters being put towards the 
 end of the chain - this has different semantics internally to the original 
 implementation but IMHO should have the same result for end users
 ∘ Learn how to create filters that are potentially more efficient, at present 
 solr basically runs a simple query that gathers a DocSet that relates to the 
 documents that we want filtered; it would be interesting to make use of 
 filter implementations that are in theory faster than query filters (for 
 instance there are filters that are able to query the FieldCache)
 ∘ Learn how to decompose filters so that a complex filter query can be cached 
 (potentially) as its constituent parts; for example the filter below 
 currently needs love, care and feeding to ensure that the filter cache is not 
 unduly stressed
 {code}
   'category:(100) OR category:(200) OR category:(300)'
 {code}
 Really there is no reason not to express this in a cached form as 
 {code}
 BooleanFilter(
 FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
   )
 {code}
 This would yeild better cache usage I think as we can resuse docsets across 
 multiple queries as well as avoid issues when filters are presented in 
 differing orders
 ∘ Instead of end users providing costing we might (and this is a big might 
 FWIW), be able to create a sort of execution plan of filters, leveraging a 
 combination of what the index is able to tell us as well as sampling and 
 ‟educated guesswork”; in essence this is what some DBMS software, for example 
 postgresql does - it has a genetic algo that attempts to solve the travelling 
 salesman - to great effect
 ∘ I am sure I will probably come up with other ambitious ideas to plug in 
 here . :S 
 Patches obviously forthcoming but the bulk of the work can be followed here 
 https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-03-12 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600324#comment-13600324
 ] 

Greg Bowyer commented on SOLR-4465:
---

Does the CollectorSpec serve the same purpose as say the GroupingSpecification, 
that is to provide underlying collectors (and the search in general) with the 
right requirements information.

I ask because maybe it would be easier to make the CollectorSpec support a map 
of String - Object or String - CollectorProperty

I am trying to think how we can do grouping with this.

 but I might have misinterpreted what its for

 Configurable Collectors
 ---

 Key: SOLR-4465
 URL: https://issues.apache.org/jira/browse/SOLR-4465
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.1
Reporter: Joel Bernstein
 Fix For: 4.3

 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch


 This issue is to add configurable custom collectors to Solr. This expands the 
 design and work done in issue SOLR-1680 to include:
 1) CollectorFactory configuration in solconfig.xml
 2) Http parameters to allow clients to dynamically select a CollectorFactory 
 and construct a custom Collector.
 3) Make aspects of QueryComponent pluggable so that the output from 
 distributed search can conform with custom collectors at the shard level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4465) Configurable Collectors

2013-02-26 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13587815#comment-13587815
 ] 

Greg Bowyer commented on SOLR-4465:
---

The tests are giving me null pointer exceptions inside SolrIndexSearcher for 
this, as for the stats part I was looking at this patch and giving some thought 
about providing that on the QueryCommand object, but I feel that it is not the 
correct place for this information.

 Configurable Collectors
 ---

 Key: SOLR-4465
 URL: https://issues.apache.org/jira/browse/SOLR-4465
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.1
Reporter: Joel Bernstein
 Fix For: 4.2, 5.0

 Attachments: SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, SOLR-4465.patch, 
 SOLR-4465.patch


 This issue is to add configurable custom collectors to Solr. This expands the 
 design and work done in issue SOLR-1680 to include:
 1) CollectorFactory configuration in solconfig.xml
 2) Http parameters to allow clients to dynamically select a CollectorFactory 
 and construct a custom Collector.
 3) Make aspects of QueryComponent pluggable so that the output from 
 distributed search can conform with custom collectors at the shard level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4464) DIH - Processed documents counter resets to zero after first database request

2013-02-15 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13579422#comment-13579422
 ] 

Greg Bowyer commented on SOLR-4464:
---

There is a good chance that a 250GB heap is the root cause of your problems, 
can you lower it to 16 or 32gb as a start and then see if this problem persists 
?

 DIH - Processed documents counter resets to zero after first database request
 -

 Key: SOLR-4464
 URL: https://issues.apache.org/jira/browse/SOLR-4464
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.1
 Environment: CentOS 6.3 x64 / apache-tomcat-7.0.35 / 
 mysql-connector-java-5.1.23 - Large machine 5TB of drives and 280GB RAM - 
 Java Heap set to 250Gb - resources are not an issue.
Reporter: Dave Cook
Priority: Minor
  Labels: patch

 [11:20] quasimotoca Solr 4.1 - Processed documents resets to 0 after 
 processing my first entity - all database schemas are identical
 [11:21] quasimotoca However, all the documents get fetched and I can query 
 the results no problem.  
 Here's a link to a screenshot - http://findocs/gridworkz.com/solr 
 Everything works perfect except the screen doesn't increment the Processed 
 counter on subsequent database Requests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4464) DIH - Processed documents counter resets to zero after first database request

2013-02-15 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13579426#comment-13579426
 ] 

Greg Bowyer commented on SOLR-4464:
---

. I should read bug reports more carefully, everything else is working fine 
so maybe the heap size is not the issue (I would still lower it however)

 DIH - Processed documents counter resets to zero after first database request
 -

 Key: SOLR-4464
 URL: https://issues.apache.org/jira/browse/SOLR-4464
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.1
 Environment: CentOS 6.3 x64 / apache-tomcat-7.0.35 / 
 mysql-connector-java-5.1.23 - Large machine 5TB of drives and 280GB RAM - 
 Java Heap set to 250Gb - resources are not an issue.
Reporter: Dave Cook
Priority: Minor
  Labels: patch

 [11:20] quasimotoca Solr 4.1 - Processed documents resets to 0 after 
 processing my first entity - all database schemas are identical
 [11:21] quasimotoca However, all the documents get fetched and I can query 
 the results no problem.  
 Here's a link to a screenshot - http://findocs/gridworkz.com/solr 
 Everything works perfect except the screen doesn't increment the Processed 
 counter on subsequent database Requests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-28 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564880#comment-13564880
 ] 

Greg Bowyer commented on LUCENE-3178:
-

So I was going to shut this down today, and just to make sure I ran the 
benchmark on the simplest code possible

... and suddenly I got good results, this is idiopathic :S

https://gist.github.com/0f017853861d050c0b66
{code}
Report after iter 19:
TaskQPS baseline  StdDevQPS mmap_tests  StdDev  
  Pct diff
  OrHighHigh1.68 (11.2%)1.73 (10.3%)
3.0% ( -16% -   27%)
PKLookup  129.89  (5.8%)  135.03  (6.0%)
4.0% (  -7% -   16%)
HighTerm8.09 (13.6%)8.43 (12.8%)
4.2% ( -19% -   35%)
   OrHighMed4.46 (10.4%)4.67  (9.5%)
4.7% ( -13% -   27%)
   OrHighLow4.82 (10.6%)5.09 (10.3%)
5.6% ( -13% -   29%)
HighSpanNear0.92  (8.1%)0.97  (7.3%)
5.9% (  -8% -   23%)
  IntNRQ2.51 (10.2%)2.67  (9.9%)
6.6% ( -12% -   29%)
  HighPhrase0.30 (11.7%)0.32 (12.8%)
6.7% ( -16% -   35%)
   MedPhrase2.93  (6.8%)3.12  (8.2%)
6.7% (  -7% -   23%)
 AndHighHigh5.46  (6.6%)5.86  (7.0%)
7.2% (  -5% -   22%)
 Respell   19.68  (5.9%)   21.15  (6.6%)
7.5% (  -4% -   21%)
   LowPhrase0.46  (9.5%)0.50 (10.2%)
7.6% ( -11% -   30%)
 Prefix35.25  (8.2%)5.66  (7.7%)
7.9% (  -7% -   25%)
HighSloppyPhrase1.54  (8.0%)1.67 (13.1%)
8.5% ( -11% -   32%)
 MedSpanNear5.25  (7.0%)5.72  (8.2%)
9.0% (  -5% -   25%)
Wildcard   12.44  (5.7%)   13.59  (6.5%)
9.2% (  -2% -   22%)
 MedSloppyPhrase2.27  (7.2%)2.49  (8.5%)
9.5% (  -5% -   27%)
 MedTerm   28.16 (10.3%)   30.89  (9.9%)
9.7% (  -9% -   33%)
  Fuzzy1   18.91  (6.0%)   20.82  (6.7%)   
10.1% (  -2% -   24%)
  Fuzzy2   19.69  (6.6%)   21.68  (7.5%)   
10.1% (  -3% -   25%)
  AndHighMed7.79  (7.5%)8.58  (6.1%)   
10.1% (  -3% -   25%)
 LowSpanNear1.45  (5.7%)1.60  (9.3%)   
10.5% (  -4% -   27%)
 LowSloppyPhrase   22.84  (7.7%)   25.45  (9.7%)   
11.4% (  -5% -   31%)
 LowTerm   46.46  (6.8%)   52.90  (7.6%)   
13.9% (   0% -   30%)
  AndHighLow   35.92  (5.3%)   42.38  (7.1%)   
18.0% (   5% -   32%)
{code}

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer commented on LUCENE-3178:
-

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM:
--

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

  was (Author: gbow...@fastmail.co.uk):
{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 11:08 PM:
---

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all assuming that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

  was (Author: gbow...@fastmail.co.uk):
{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
cleaner, and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of its in misc, its not default and your on your own if it breaks. 
The fact it yields AFAICT no performance gains

[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-09 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548885#comment-13548885
 ] 

Greg Bowyer commented on LUCENE-3178:
-

Frustrating, it echos what I have been seeing so at least my benchmarking is 
not playing me up, I guess I will have to do some digging.

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3178) Native MMapDir

2013-01-08 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3178:


Attachment: LUCENE-3178-Native-MMap-implementation.patch

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3178) Native MMapDir

2013-01-08 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3178:


Attachment: LUCENE-3178-Native-MMap-implementation.patch

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch, 
 LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3178) Native MMapDir

2013-01-07 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3178:


Attachment: LUCENE-3178-Native-MMap-implementation.patch

Rough cut of a native mmap (does not do any madvise, probably insanely buggy 
etc etc)

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3178) Native MMapDir

2013-01-07 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3178:


Attachment: (was: LUCENE-3178-Native-MMap-implementation.patch)

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12

 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3178) Native MMapDir

2013-01-07 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-3178:


Attachment: LUCENE-3178-Native-MMap-implementation.patch

Temp skip unit test until fixed

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12
 Attachments: LUCENE-3178-Native-MMap-implementation.patch


 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: madvise and gregs hallucinations

2012-12-11 Thread Greg Bowyer

On 12/11/2012 11:59 AM, Yonik Seeley wrote:
 On Tue, Dec 11, 2012 at 2:32 PM, Greg Bowyer gbow...@fastmail.co.uk wrote:
 Yes the index can fit in ram on the boxes I am testing with - Its the
 main rationale for sharding to make sure that we can hold an index in
 ram at all times.

 MADV_WILLNEED might be rather bad if the index is bigger than ram
 (something to test maybe)
 Agree.  And the largest part of the index is often stored fields,
 which have a random access pattern.
 MADV_RANDOM?
Maybe I would have to go digging to see if its implemented.

With this said, so far its a hypothesis supported by weak
experimentation so I need to get it under the benchmarking suite to
really be sure


 -Yonik
 http://lucidworks.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2012-12-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528681#comment-13528681
 ] 

Greg Bowyer commented on LUCENE-3178:
-

Tangentially I have been futzing a little with this based on some observations 
I noticed around madvise 
http://people.apache.org/~gbowyer/madvise-perf/index.html

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12

 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

madvise and gregs hallucinations

2012-12-10 Thread Greg Bowyer

Since its too long (and has too much HTML and pictures and such forth)
for the mailing list I have a more detailed write up here

http://people.apache.org/~gbowyer/madvise-perf/index.html

However the short version.

At $DAYJOB as part of moving to lucene 4.0 I have been looking at what
we can change in our modes of thinking, on of these relates (for us) to
having a separate legacy system that serves the stored data for the
search engine (which we imaginatively call docserve).

Whilst playing with this I noticed that I lost a lot of performance
rather quickly, after a lot of digging it looks like getting a call in
mmaping to madvise (specifically a call of the form madvise(addr,
MADV_WILLNEED) might improve search performance.

Anyone have any thoughts and or ideas here, I am going to try to get a
whole heap more investigation done before I start messing with lucene's
behavior on mmap (because whilst it improves /my/ performance a lot it
might be I only noticed it since I am poking an open-wound YMMV), but
there is something of interest here.

Then again this might be my own personal insanity .. :S

-- Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2012-12-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528687#comment-13528687
 ] 

Greg Bowyer commented on LUCENE-3178:
-

Robert, do you still have you old approach code kicking about anywhere

 Native MMapDir
 --

 Key: LUCENE-3178
 URL: https://issues.apache.org/jira/browse/LUCENE-3178
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
  Labels: gsoc2012, lucene-gsoc-12

 Spinoff from LUCENE-2793.
 Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
 level IO flags depending on the IOContext, we could in theory do something 
 similar with MMapDir.
 The problem is MMap is apparently quite hairy... and to pass the flags the 
 native code would need to invoke mmap (I think?), unlike UnixDir where the 
 code only has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2701) Expose IndexWriter.commit(MapString,String commitUserData) to solr

2012-11-29 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506688#comment-13506688
 ] 

Greg Bowyer commented on SOLR-2701:
---

bq. I haven't had a chance to check out the rest of the patch/issue, but for 
this specifically, what about a convention? Anything under the persistent key 
in the commit data is carried over indefinitely. Or if persistent is the norm, 
then we could reverse it and have a transient map that is not carried over.

The persistent/transient map sounds like a good idea; I will take a look at how 
that can be implemented

 Expose IndexWriter.commit(MapString,String commitUserData) to solr 
 -

 Key: SOLR-2701
 URL: https://issues.apache.org/jira/browse/SOLR-2701
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0-ALPHA
Reporter: Eks Dev
Priority: Minor
  Labels: commit, update
 Attachments: SOLR-2701-Expose-userCommitData-throughout-solr.patch, 
 SOLR-2701.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 At the moment, there is no feature that enables associating user information 
 to the commit point.
  
 Lucene supports this possibility and it should be exposed to solr as well, 
 probably via beforeCommit Listener (analogous to prepareCommit in Lucene).
 Most likely home for this Map to live is UpdateHandler.
 Example use case would be an atomic tracking of sequence numbers or 
 timestamps for incremental updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2701) Expose IndexWriter.commit(MapString,String commitUserData) to solr

2012-11-28 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-2701:
--

Attachment: SOLR-2701-Expose-userCommitData-throughout-solr.patch

I gave this another attempt today, and went full bore on trying to find all the 
locations of where userCommitData would need to be exposed to clients of the 
SOLR API.

There are a few questions in my mind about this:

* The backwards compat for javabin is not obvious, do we want to change up the 
version on javabin

* What should be the exacting behavior around soft and autocommits

* Should previous index commits carry forward in solr for ease of use ?

 Expose IndexWriter.commit(MapString,String commitUserData) to solr 
 -

 Key: SOLR-2701
 URL: https://issues.apache.org/jira/browse/SOLR-2701
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 4.0-ALPHA
Reporter: Eks Dev
Priority: Minor
  Labels: commit, update
 Attachments: SOLR-2701-Expose-userCommitData-throughout-solr.patch, 
 SOLR-2701.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 At the moment, there is no feature that enables associating user information 
 to the commit point.
  
 Lucene supports this possibility and it should be exposed to solr as well, 
 probably via beforeCommit Listener (analogous to prepareCommit in Lucene).
 Most likely home for this Map to live is UpdateHandler.
 Example use case would be an atomic tracking of sequence numbers or 
 timestamps for incremental updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr ResponseBuilder and totalHitCount

2012-10-05 Thread Greg Bowyer

Hi all, I have been off the radar for too long

I am working on a requirement at $DAYJOB where there is a desire to
monitor the rate of low and zero result queries, as such I did the
simplest thing I could think of and wrote a search component that looks
through the response object in the later phases of a distributed request.

As I was doing this it struck me how inconsistent it is to find the
total number of hits for a given search query, even if the response is
manipulated heavily after the fact shouldn't we make it easier for
people writing things like search components / transformers etc to find
out how many matches they had ?

This is where I started to wonder if it makes sense to hide / elide the
original usage of totalHitCount as part of grouping, and use this field
for presenting some sensible number of matches for the query; I know
that this might break backwards compat with people who look at this
field, but then I figure it is very ambiguously named so many naive
users are likely to use this field not realizing that it is all about
grouping.

I am probably going to craft a patch to this end, unless someone has any
intuition that I am missing here

-- Greg

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3784) Schema-Browser hangs because of similarity

2012-09-08 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451442#comment-13451442
 ] 

Greg Bowyer commented on SOLR-3784:
---

Committed to trunk
Sendingsolr/webapp/web/js/scripts/schema-browser.js
Transmitting file data .
Committed revision 1382385.

and branch_4x

Sendingsolr/webapp/web/js/scripts/schema-browser.js
Transmitting file data .
Committed revision 1382384.


 Schema-Browser hangs because of similarity
 --

 Key: SOLR-3784
 URL: https://issues.apache.org/jira/browse/SOLR-3784
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Greg Bowyer
 Attachments: SOLR-3784.patch, SOLR-3784.patch


 Opening the Schema-Browser with the Default Configuration, switching the 
 selection to type=int throws an error:
 {code}Uncaught TypeError: Cannot call method 'esc' of undefined // 
 schema-browser.js:893{code}
 On the first Look, this was introduced by SOLR-3572 .. right? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3784) Schema-Browser hangs because of similarity

2012-09-08 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer resolved SOLR-3784.
---

Resolution: Fixed

 Schema-Browser hangs because of similarity
 --

 Key: SOLR-3784
 URL: https://issues.apache.org/jira/browse/SOLR-3784
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Greg Bowyer
 Attachments: SOLR-3784.patch, SOLR-3784.patch


 Opening the Schema-Browser with the Default Configuration, switching the 
 selection to type=int throws an error:
 {code}Uncaught TypeError: Cannot call method 'esc' of undefined // 
 schema-browser.js:893{code}
 On the first Look, this was introduced by SOLR-3572 .. right? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2012-09-05 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448541#comment-13448541
 ] 

Greg Bowyer commented on LUCENE-4332:
-

{quote}
permission java.security.SecurityPermission *, read,write;
This makes no sense, as SecurityPermission has no action, so read,write 
should be ignored. I was restricting SecurityPermission with something in mind 
(see the last 2 lines that allowed only the BouncyCastle installed by TIKA - 
now everything is allowed). What fails if I remove that line? I have no time to 
run the whole pitest suite
{quote}

You are right on that, I will change it

{quote}
The idea was to find places (especially in TIKA) that do things they should not 
do (like enabling security providers), which makes the configuration of J2EE 
container hosting Solr hard. So we should limit all this, to see when somebody 
adds a new feature to Solr that needs additional permissions.
I am already working on restricting RuntimePermission more, so only things 
like reflection and property access is allowed.
{quote}

Ok the intention changed a fair bit, I was still under the impression that we 
were targeting making this keep tests in a sandbox rather than helping solr 
with hosting inside complex J2EE arrangements

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2012-09-05 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448556#comment-13448556
 ] 

Greg Bowyer commented on LUCENE-4332:
-

Ok the security permission stuff is tightened up for just the internal jvm cache

basically it is as follows
{code}
// Needed for some things in DNS caching in the JVM
permission java.security.SecurityPermission 
getProperty.networkaddress.cache.ttl;
permission java.security.SecurityPermission 
getProperty.networkaddress.cache.negative.ttl;
{code}

branch_4x 
Sendinglucene/tools/junit4/tests.policy
Transmitting file data .
Committed revision 1381046.

trunk
Sendinglucene/tools/junit4/tests.policy
Transmitting file data .
Committed revision 1381047.
greg@gregslaptop ~/project

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3784) Schema-Browser hangs because of similarity

2012-09-05 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3784:
--

Attachment: SOLR-3784.patch

I modified your patch a little to be defensive around the classname as well, I 
see where my initial mistake was I got my languages confused and thought that 
in JS {} would evaluate to false (like in python)

Hopefully this should solve it, do you want to commit this or shall I ?

 Schema-Browser hangs because of similarity
 --

 Key: SOLR-3784
 URL: https://issues.apache.org/jira/browse/SOLR-3784
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Greg Bowyer
 Attachments: SOLR-3784.patch, SOLR-3784.patch


 Opening the Schema-Browser with the Default Configuration, switching the 
 selection to type=int throws an error:
 {code}Uncaught TypeError: Cannot call method 'esc' of undefined // 
 schema-browser.js:893{code}
 On the first Look, this was introduced by SOLR-3572 .. right? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3784) Schema-Browser hangs because of similarity


[ 
https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448111#comment-13448111
 ] 

Greg Bowyer commented on SOLR-3784:
---

Eep that was bad of me, want me to fix this ?

 Schema-Browser hangs because of similarity
 --

 Key: SOLR-3784
 URL: https://issues.apache.org/jira/browse/SOLR-3784
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Greg Bowyer
 Attachments: SOLR-3784.patch


 Opening the Schema-Browser with the Default Configuration, switching the 
 selection to type=int throws an error:
 {code}Uncaught TypeError: Cannot call method 'esc' of undefined // 
 schema-browser.js:893{code}
 On the first Look, this was introduced by SOLR-3572 .. right? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3784) Schema-Browser hangs because of similarity


[ 
https://issues.apache.org/jira/browse/SOLR-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448116#comment-13448116
 ] 

Greg Bowyer commented on SOLR-3784:
---

Humm why does this only trigger for type int ?

that makes less sense

 Schema-Browser hangs because of similarity
 --

 Key: SOLR-3784
 URL: https://issues.apache.org/jira/browse/SOLR-3784
 Project: Solr
  Issue Type: Bug
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Greg Bowyer
 Attachments: SOLR-3784.patch


 Opening the Schema-Browser with the Default Configuration, switching the 
 selection to type=int throws an error:
 {code}Uncaught TypeError: Cannot call method 'esc' of undefined // 
 schema-browser.js:893{code}
 On the first Look, this was introduced by SOLR-3572 .. right? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448212#comment-13448212
 ] 

Greg Bowyer commented on LUCENE-4332:
-

committed to trunk 

Sendingbuild.xml
Sendinglucene/build.xml
Sendinglucene/common-build.xml
Sendinglucene/tools/build.xml
Sendinglucene/tools/junit4/tests.policy
Sendingsolr/build.xml
Sendingsolr/example/build.xml
Sendingsolr/example/example-DIH/build.xml
Transmitting file data 
Committed revision 1380938.

and branch_4x

Sendingbuild.xml
Sendinglucene/build.xml
Sendinglucene/common-build.xml
Sendinglucene/tools/build.xml
Sendinglucene/tools/junit4/tests.policy
Sendingsolr/build.xml
Sendingsolr/example/build.xml
Sendingsolr/example/example-DIH/build.xml
Transmitting file data 
Committed revision 1380937.

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448214#comment-13448214
 ] 

Greg Bowyer commented on LUCENE-4332:
-

I guess now we need to figure out how and when to get jenkins to run this .. 
any thoughts ?

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448235#comment-13448235
 ] 

Greg Bowyer commented on LUCENE-4332:
-

Pitest forks surrogate runner jvms, when these things bootup they do some 
socket related nonsense for communication (SecurityPermission), and set on 
thread time monitoring (ManagementPermission)

SecurityPermission hides out in lots of strange places, one of these is in the 
DNS cache internal to a JVM.

We could restrict it, I didn't for ease of configuration since the security 
manager is aimed at preventing none malicious mistakes (like writing outside of 
the sandbox) rather than as a full force prevention of malicious code (which 
would require a bit more thinking through IMHO)

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2012-08-30 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444718#comment-13444718
 ] 

Greg Bowyer commented on LUCENE-4332:
-

{quote}
Greg, what's this responsible for:
{code}
+  property name=pitest.threads value=2 /
Is this test execution concurrency at JVM level? If so, it needs to be set to 
1 because tests won't support concurrent suites (too many static stuff lying 
around).
{code}
{quote}

Its a number of runner vms used by pitest (I think) so in essence it should 
spawn two vms to run test through (but let me check I might have that wrong)

{quote}
{code}
+!-- Java7 has a new bytecode verifier that _requires_ stackmaps to be 
present in the
+ bytecode. Unfortunately a *lot* of bytecode manipulation tools 
(including pitest)
+ do not currently write out this information; for now we are 
forced to disable this
+ in the jvm args --
{code}
Both proguard and asmlib produce these I think. Just a note, I know it's 
probably not possible to integrate in pi's toolchain without changes to its 
source code.
{quote}
Humm weird I thought I had problems with the verifier but now that I just tried 
it, it worked - Guess I will be taking that out then


{quote}
Will this require people to put rhino in ant's lib/ folder? If so, I'd just use 
an ugly property value=-D..=${} -D..=${}... Yours is very nice but an 
additional step is required which seems an overkill?
{quote}
Might do, depends on the JVM vendor / version - I will change it to the ugly


 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2012-08-30 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-4332:


Attachment: 
LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch

I found and fixed a few things relating to security policies (ha!)

Also I thought about the javascript a bit, I know thats its standard in the JVM 
but when I tested on a 1.6 I found that my javascript was a bit too advanced 
for it.

That made me pause for thought and realise that I might have been a bit too 
clever for my own good there; I am not all that happy with the the JS so I went 
with the ugly approach for now.

It think the right solution is to work with the pitest people and get it to 
support junit style nested tags like sysproperty

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2012-08-30 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445226#comment-13445226
 ] 

Greg Bowyer commented on LUCENE-4332:
-

Do we want to implement this now that Uwe's changes are in 4.0 / trunk ?

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing


[ 
https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1344#comment-1344
 ] 

Greg Bowyer commented on LUCENE-4337:
-

I have a slightly different approach that I was testing last night that 
programatically creates the policy so that we dont have to allow other 
permissions all the time 



 Create Java security manager for forcible asserting behaviours in testing
 -

 Key: LUCENE-4337
 URL: https://issues.apache.org/jira/browse/LUCENE-4337
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.1, 5.0, 4.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: ChrootSecurityManager.java, 
 ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, 
 LUCENE-4337.patch


 Following on from conversations about mutation testing, there is an interest 
 in building a Java security manager that is able to assert / guarantee 
 certain behaviours 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing


 [ 
https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-4337:


Attachment: ChrootSecurityManagerTest.java
ChrootSecurityManager.java

Gregs alternative approach (untested - just for discussion)



 Create Java security manager for forcible asserting behaviours in testing
 -

 Key: LUCENE-4337
 URL: https://issues.apache.org/jira/browse/LUCENE-4337
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.1, 5.0, 4.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: ChrootSecurityManager.java, 
 ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, 
 LUCENE-4337.patch


 Following on from conversations about mutation testing, there is an interest 
 in building a Java security manager that is able to assert / guarantee 
 certain behaviours 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing


[ 
https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1345#comment-1345
 ] 

Greg Bowyer edited comment on LUCENE-4337 at 8/30/12 8:47 AM:
--

Gregs alternative approach (ChrootSecurityManager) (untested - just for 
discussion); not that I dont think we can use the approach Uwe worked on, just 
that I am not sure with going down the strict policy route



  was (Author: gbow...@fastmail.co.uk):
Gregs alternative approach (untested - just for discussion)


  
 Create Java security manager for forcible asserting behaviours in testing
 -

 Key: LUCENE-4337
 URL: https://issues.apache.org/jira/browse/LUCENE-4337
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.1, 5.0, 4.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: ChrootSecurityManager.java, 
 ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, 
 LUCENE-4337.patch


 Following on from conversations about mutation testing, there is an interest 
 in building a Java security manager that is able to assert / guarantee 
 certain behaviours 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing


[ 
https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1356#comment-1356
 ] 

Greg Bowyer commented on LUCENE-4337:
-

Actually you know what, I think I just talked myself out of my approach, yes 
its a bit more generic but maybe its less transparent to people that dont spend 
their time reading JVM source code, at least the policy file is more obvious to 
those who are approaching this for the first time.

 Create Java security manager for forcible asserting behaviours in testing
 -

 Key: LUCENE-4337
 URL: https://issues.apache.org/jira/browse/LUCENE-4337
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.1, 5.0, 4.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: ChrootSecurityManager.java, 
 ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, 
 LUCENE-4337.patch


 Following on from conversations about mutation testing, there is an interest 
 in building a Java security manager that is able to assert / guarantee 
 certain behaviours 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing


[ 
https://issues.apache.org/jira/browse/LUCENE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444635#comment-13444635
 ] 

Greg Bowyer commented on LUCENE-4337:
-

{quote}
I also spend my time in reading JDK source code today! But it was more when 
fixing Solr test violations (die, JMX/RMI, die, die, die).
{quote}

Right there with you on RMI

 Create Java security manager for forcible asserting behaviours in testing
 -

 Key: LUCENE-4337
 URL: https://issues.apache.org/jira/browse/LUCENE-4337
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.1, 5.0, 4.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: ChrootSecurityManager.java, 
 ChrootSecurityManagerTest.java, LUCENE-4337.patch, LUCENE-4337.patch, 
 LUCENE-4337.patch


 Following on from conversations about mutation testing, there is an interest 
 in building a Java security manager that is able to assert / guarantee 
 certain behaviours 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


 [ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-4332:


Attachment: 
LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch

New version

* uses ivy:cachepath as suggested
* removed random license files
* leverages security manager (LUCENE-4337)
* uses fixed random seed

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


 [ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-4332:


Attachment: 
LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Test coverage or testing the tests

2012-08-28 Thread Greg Bowyer

Ok first cut at a version of this in the build
https://issues.apache.org/jira/browse/LUCENE-4332

On 27/08/12 18:05, Greg Bowyer wrote:
 On 27/08/12 17:30, Chris Hostetter wrote:
 : This is cool. I'd say lets get it up and going on jenkins (even weekly
 : or something). why worry about the imperfections in any of these
 : coverage tools, whats way more important is when the results find
 : situations where you thought you were testing something, but really

 +1.

 Even if it hammers the machine so bad it can't be run on mortal 
 hardware, it's still worth it to hook it into the build system so people 
 with god like hardware can easily run it and file bugs based on what 
 they see.

 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

 The machine I ran it on cost me $5 from ec2 :D


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


 [ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated LUCENE-4332:


Attachment: 
LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch

Corrected jcommander license

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Description: 
Presently solr uses bitsets, queries and collectors to implement the concept of 
filters. This has proven to be very powerful, but does come at the cost of 
introducing a large body of code into solr making it harder to optimise and 
maintain.

Another issue here is that filters currently cache sub-optimally given the 
changes in lucene towards atomic readers.

Rather than patch these issues, this is an attempt to rework the filters in 
solr to leverage the Filter subsystem from lucene as much as possible.

In good time the aim is to get this to do the following:

∘ Handle setting up filter implementations that are able to correctly cache 
with reference to the AtomicReader that they are caching for rather that for 
the entire index at large

∘ Get the post filters working, I am thinking that this can be done via lucenes 
chained filter, with the ‟expensive” filters being put towards the end of the 
chain - this has different semantics internally to the original implementation 
but IMHO should have the same result for end users

∘ Learn how to create filters that are potentially more efficient, at present 
solr basically runs a simple query that gathers a DocSet that relates to the 
documents that we want filtered; it would be interesting to make use of filter 
implementations that are in theory faster than query filters (for instance 
there are filters that are able to query the FieldCache)

∘ Learn how to decompose filters so that a complex filter query can be cached 
(potentially) as its constituent parts; for example the filter below currently 
needs love, care and feeding to ensure that the filter cache is not unduly 
stressed

{code}
  'category:(100) OR category:(200) OR category:(300)'
{code}

Really there is no reason not to express this in a cached form as 

{code}
BooleanFilter(
FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
  )
{code}

This would yeild better cache usage I think as we can resuse docsets across 
multiple queries as well as avoid issues when filters are presented in 
differing orders

∘ Instead of end users providing costing we might (and this is a big might 
FWIW), be able to create a sort of execution plan of filters, leveraging a 
combination of what the index is able to tell us as well as sampling and 
‟educated guesswork”; in essence this is what some DBMS software, for example 
postgresql does - it has a genetic algo that attempts to solve the travelling 
salesman - to great effect

∘ I am sure I will probably come up with other ambitious ideas to plug in here 
. :S 

Patches obviously forthcoming but the bulk of the work can be followed here 
https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

  was:
Presently solr uses bitsets, queries and collectors to implement the concept of 
filters. This has proven to be very powerful, but does come at the cost of 
introducing a large body of code into solr making it harder to optimise and 
maintain.

Another issue here is that filters currently cache sub-optimally given the 
changes in lucene towards atomic readers.

Rather than patch these issues, this is an attempt to rework the filters in 
solr to leverage the Filter subsystem from lucene as much as possible.

In good time the aim is to get this to do the following:

∘ Handle setting up filter implementations that are able to correctly cache 
with reference to the AtomicReader that they are caching for rather that for 
the entire index at large

∘ Get the post filters working, I am thinking that this can be done via lucenes 
chained filter, with the ‟expensive” filters being put towards the end of the 
chain - this has different semantics internally to the original implementation 
but IMHO should have the same result for end users

∘ Learn how to create filters that are potentially more efficient, at present 
solr basically runs a simple query that gathers a DocSet that relates to the 
documents that we want filtered; it would be interesting to make use of filter 
implementations that are in theory faster than query filters (for instance 
there are filters that are able to query the FieldCache)

∘ Learn how to decompose filters so that a complex filter query can be cached 
(potentially) as its constituent parts; for example the filter below currently 
needs love, care and feeding to ensure that the filter cache is not unduly 
stressed

{code}
  'category:(100) OR category:(200) OR category:(300)'
{code}

Really there is no reason not to express this in a cached form as 

{code}
BooleanFilter(
FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD

[jira] [Created] (SOLR-3763) Make solr use lucene filters directly

Greg Bowyer created SOLR-3763:
-

 Summary: Make solr use lucene filters directly
 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer


Presently solr uses bitsets, queries and collectors to implement the concept of 
filters. This has proven to be very powerful, but does come at the cost of 
introducing a large body of code into solr making it harder to optimise and 
maintain.

Another issue here is that filters currently cache sub-optimally given the 
changes in lucene towards atomic readers.

Rather than patch these issues, this is an attempt to rework the filters in 
solr to leverage the Filter subsystem from lucene as much as possible.

In good time the aim is to get this to do the following:

∘ Handle setting up filter implementations that are able to correctly cache 
with reference to the AtomicReader that they are caching for rather that for 
the entire index at large

∘ Get the post filters working, I am thinking that this can be done via lucenes 
chained filter, with the ‟expensive” filters being put towards the end of the 
chain - this has different semantics internally to the original implementation 
but IMHO should have the same result for end users

∘ Learn how to create filters that are potentially more efficient, at present 
solr basically runs a simple query that gathers a DocSet that relates to the 
documents that we want filtered; it would be interesting to make use of filter 
implementations that are in theory faster than query filters (for instance 
there are filters that are able to query the FieldCache)

∘ Learn how to decompose filters so that a complex filter query can be cached 
(potentially) as its constituent parts; for example the filter below currently 
needs love, care and feeding to ensure that the filter cache is not unduly 
stressed

{code}
  'category:(100) OR category:(200) OR category:(300)'
{code}

Really there is no reason not to express this in a cached form as 

{code}
BooleanFilter(
FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
  )
{code}

This would yeild better cache usage I think as we can resuse docsets across 
multiple queries as well as avoid issues when filters are presented in 
differing orders

∘ Instead of end users providing costing we might (and this is a big might 
FWIW), be able to create a sort of execution plan of filters, leveraging a 
combination of what the index is able to tell us as well as sampling and 
‟educated guesswork”; in essence this is what some DBMS software, for example 
postgresql does - it has a genetic algo that attempts to solve the travelling 
salesman - to great effect

∘ I am sure I will probably come up with other ambitious ideas to plug in here 
. :S 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3763) Make solr use lucene filters directly


 [ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer updated SOLR-3763:
--

Attachment: SOLR-3763-Make-solr-use-lucene-filters-directly.patch

Initial version, this has some hacks in it and does not pass testing for caches 
since that needs to be reworked

 Make solr use lucene filters directly
 -

 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch


 Presently solr uses bitsets, queries and collectors to implement the concept 
 of filters. This has proven to be very powerful, but does come at the cost of 
 introducing a large body of code into solr making it harder to optimise and 
 maintain.
 Another issue here is that filters currently cache sub-optimally given the 
 changes in lucene towards atomic readers.
 Rather than patch these issues, this is an attempt to rework the filters in 
 solr to leverage the Filter subsystem from lucene as much as possible.
 In good time the aim is to get this to do the following:
 ∘ Handle setting up filter implementations that are able to correctly cache 
 with reference to the AtomicReader that they are caching for rather that for 
 the entire index at large
 ∘ Get the post filters working, I am thinking that this can be done via 
 lucenes chained filter, with the ‟expensive” filters being put towards the 
 end of the chain - this has different semantics internally to the original 
 implementation but IMHO should have the same result for end users
 ∘ Learn how to create filters that are potentially more efficient, at present 
 solr basically runs a simple query that gathers a DocSet that relates to the 
 documents that we want filtered; it would be interesting to make use of 
 filter implementations that are in theory faster than query filters (for 
 instance there are filters that are able to query the FieldCache)
 ∘ Learn how to decompose filters so that a complex filter query can be cached 
 (potentially) as its constituent parts; for example the filter below 
 currently needs love, care and feeding to ensure that the filter cache is not 
 unduly stressed
 {code}
   'category:(100) OR category:(200) OR category:(300)'
 {code}
 Really there is no reason not to express this in a cached form as 
 {code}
 BooleanFilter(
 FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
   )
 {code}
 This would yeild better cache usage I think as we can resuse docsets across 
 multiple queries as well as avoid issues when filters are presented in 
 differing orders
 ∘ Instead of end users providing costing we might (and this is a big might 
 FWIW), be able to create a sort of execution plan of filters, leveraging a 
 combination of what the index is able to tell us as well as sampling and 
 ‟educated guesswork”; in essence this is what some DBMS software, for example 
 postgresql does - it has a genetic algo that attempts to solve the travelling 
 salesman - to great effect
 ∘ I am sure I will probably come up with other ambitious ideas to plug in 
 here . :S 
 Patches obviously forthcoming but the bulk of the work can be followed here 
 https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443304#comment-13443304
 ] 

Greg Bowyer commented on LUCENE-4332:
-

Wow lot of interest .

I will try to answer some of the salient points

Core was missing until today as one test (TestLuceneConstantVersion) didn't run 
correctly as it was lacking the Lucene version system property. Currently pit 
refuses to run unless the underlying suite is all green (a good thing IMHO) so 
I didn't have core from my first run (its there now).

This takes a long time to run, all of the ancillary Lucene packages take 
roughly 4 hours to run on the largest CPU ec2 instance, core takes 8 hours 
(this was the other reason core was missing, I was waiting for it to finish 
crunching)

As to the random seed, I completely agree and it was one of the things I 
mentioned on the mailing list that makes the output of this tool not perfect. I 
do feel that the tests that are randomised typically do a better job at gaining 
coverage, but its a good idea to stabilise the seed.

Jars and build.xml, I have no problems changing this to whatever people think 
fits best into the build. My impression was that clover is handled the way it 
is because it is not technically opensource and as a result has screwball 
licensing concerns, essentially I didn't know any better :S I will try to get a 
chance to make it use the ivy:cachepath approach.

Regarding the risks posed by mutations, I cannot prove or say there are no 
risks; however mutation testing is not random in the mutations applied, they 
are formulaic and quite simple. It will not permute arguments nor will it 
mutate complex objects (it can and does mess with object references turning 
references in arguments to nulls). I can conceive of ways in which it could 
screwup mutated code making it possible to delete random files but I don't 
think they are going to be extremely likely situations. FWIW I would be less 
worried about this deleting something on the filesystem and far more worried 
about it accidentally leaving corpses of undeleted files. 

Sandboxing it could solve that issue, if that is too much effort another 
approach might be to work with the pitest team and build a security manager 
that is militant about file access, disallowing anything that canonicalises 
outside of a given path.

Oh and as Robert suggested we can always point it away from key things.

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

[
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443304#comment-13443304
]

Greg Bowyer edited comment on LUCENE-4332 at 8/29/12 4:51 AM:
--

Wow lot of interest .

I will try to answer some of the salient points

Core was missing until today as one test (TestLuceneConstantVersion) didn't run
correctly as it was lacking the Lucene version system property. Currently pit
refuses to run unless the underlying suite is all green (a good thing IMHO) so
I didn't have core from my first run (its there now).

This takes a long time to run, all of the ancillary Lucene packages take
roughly 4 hours to run on the largest CPU ec2 instance, core takes 8 hours
(this was the other reason core was missing, I was waiting for it to finish
crunching)

As to the random seed, I completely agree and it was one of the things I
mentioned on the mailing list that makes the output of this tool not perfect. I
do feel that the tests that are randomised typically do a better job at gaining
coverage, but its a good idea to stabilise the seed.

Jars and build.xml, I have no problems changing this to whatever people think
fits best into the build. My impression was that clover is handled the way it
is because it is not technically opensource and as a result has screwball
licensing concerns, essentially I didn't know any better :S I will try to get a
chance to make it use the ivy:cachepath approach.

Regarding the risks posed by mutations, I cannot prove or say there are no
risks; however mutation testing is not random in the mutations applied, they
are formulaic and quite simple. It will not permute arguments nor will it
mutate complex objects (it can and does mess with object references turning
references in arguments to nulls). I can conceive of ways in which it could
screwup mutated code making it possible to delete random files but I don't
think they are going to be extremely likely situations. FWIW I would be less
worried about this deleting something on the filesystem and far more worried
about it accidentally leaving corpses of undeleted files.

Sandboxing it could solve that issue, if that is too much effort another
approach might be to work with the pitest team and build a security manager
that is militant about file access, disallowing anything that canonicalises
outside of a given path.

Oh and as Robert suggested we can always point it away from key things.

At the end of the day its a tool like any other, I have exactly the same
feelings as Robert on this
{quote}

This is cool. I'd say lets get it up and going on jenkins (even weekly
or something). why worry about the imperfections in any of these
coverage tools, whats way more important is when the results find
situations where you thought you were testing something, but really
arent, etc (here was a recent one found by clover
http://svn.apache.org/viewvc?rev=1376722view=rev).

so imo just another tool to be able to identify serious gaps/test-bugs
after things are up and running. and especially looking at deltas from
line coverage to identify stuff thats 'executing' but not actually
being tested.
{quote}

was (Author: gbow...@fastmail.co.uk):
Wow lot of interest .

I will try to answer some of the salient points

[jira] [Commented] (SOLR-3763) Make solr use lucene filters directly


[ 
https://issues.apache.org/jira/browse/SOLR-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443409#comment-13443409
 ] 

Greg Bowyer commented on SOLR-3763:
---

I guess my next step is to get caching working, I am not sure quite how to take 
baby steps with this beyond getting to feature parity.

 Make solr use lucene filters directly
 -

 Key: SOLR-3763
 URL: https://issues.apache.org/jira/browse/SOLR-3763
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: SOLR-3763-Make-solr-use-lucene-filters-directly.patch


 Presently solr uses bitsets, queries and collectors to implement the concept 
 of filters. This has proven to be very powerful, but does come at the cost of 
 introducing a large body of code into solr making it harder to optimise and 
 maintain.
 Another issue here is that filters currently cache sub-optimally given the 
 changes in lucene towards atomic readers.
 Rather than patch these issues, this is an attempt to rework the filters in 
 solr to leverage the Filter subsystem from lucene as much as possible.
 In good time the aim is to get this to do the following:
 ∘ Handle setting up filter implementations that are able to correctly cache 
 with reference to the AtomicReader that they are caching for rather that for 
 the entire index at large
 ∘ Get the post filters working, I am thinking that this can be done via 
 lucenes chained filter, with the ‟expensive” filters being put towards the 
 end of the chain - this has different semantics internally to the original 
 implementation but IMHO should have the same result for end users
 ∘ Learn how to create filters that are potentially more efficient, at present 
 solr basically runs a simple query that gathers a DocSet that relates to the 
 documents that we want filtered; it would be interesting to make use of 
 filter implementations that are in theory faster than query filters (for 
 instance there are filters that are able to query the FieldCache)
 ∘ Learn how to decompose filters so that a complex filter query can be cached 
 (potentially) as its constituent parts; for example the filter below 
 currently needs love, care and feeding to ensure that the filter cache is not 
 unduly stressed
 {code}
   'category:(100) OR category:(200) OR category:(300)'
 {code}
 Really there is no reason not to express this in a cached form as 
 {code}
 BooleanFilter(
 FilterClause(CachedFilter(TermFilter(Term(category, 100))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 200))), SHOULD),
 FilterClause(CachedFilter(TermFilter(Term(category, 300))), SHOULD)
   )
 {code}
 This would yeild better cache usage I think as we can resuse docsets across 
 multiple queries as well as avoid issues when filters are presented in 
 differing orders
 ∘ Instead of end users providing costing we might (and this is a big might 
 FWIW), be able to create a sort of execution plan of filters, leveraging a 
 combination of what the index is able to tell us as well as sampling and 
 ‟educated guesswork”; in essence this is what some DBMS software, for example 
 postgresql does - it has a genetic algo that attempts to solve the travelling 
 salesman - to great effect
 ∘ I am sure I will probably come up with other ambitious ideas to plug in 
 here . :S 
 Patches obviously forthcoming but the bulk of the work can be followed here 
 https://github.com/GregBowyer/lucene-solr/commits/solr-uses-lucene-filters

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443424#comment-13443424
 ] 

Greg Bowyer commented on LUCENE-4332:
-

{quote}
Thats a cool idea also for our own tests! We should install a SecurityManager 
always and only allow files in build/test. LuceneTestCase can enforce this 
SecurityManager installed! And if a test writes outside, fail it!
{quote}

Should we split out that as a separate thing and get a security manager built 
that hooks into the awesome carrot testing stuffs

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443449#comment-13443449
 ] 

Greg Bowyer commented on LUCENE-4332:
-

Following up it turns out to be *very* simple to do the security manager trick

{code:java}
import java.io.File;

public class Test {

public static void main(String... args) {
System.setSecurityManager(new SecurityManager() {
public void checkDelete(String file) throws SecurityException {
File fp = new File(file);
String path = fp.getAbsolutePath();

if (!path.startsWith(/tmp)) {
throw new SecurityException(Bang!);
}
}
});

new File(/home/greg/test).delete();
}
}
{code}

{code}
Exception in thread main java.lang.SecurityException: Bang!
at Test$1.checkDelete(Test.java:12)
at java.io.File.delete(File.java:971)
at Test.main(Test.java:17)
{code}

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443449#comment-13443449
 ] 

Greg Bowyer edited comment on LUCENE-4332 at 8/29/12 6:48 AM:
--

Following up it turns out to be *very* simple to do the security manager trick

{code:java}
import java.io.File;

public class Test {

public static void main(String... args) {
System.setSecurityManager(new SecurityManager() {
public void checkDelete(String file) throws SecurityException {
File fp = new File(file);
String path = fp.getAbsolutePath();

if (!path.startsWith(/tmp)) {
throw new SecurityException(Bang!);
}
}
});

new File(/home/greg/test).delete();
}
}
{code}

{code}
Exception in thread main java.lang.SecurityException: Bang!
at Test$1.checkDelete(Test.java:12)
at java.io.File.delete(File.java:971)
at Test.main(Test.java:17)
{code}

There is a lot of scope here if you want to abuse checking for all sorts of 
things (files, sockets, threads etc)

  was (Author: gbow...@fastmail.co.uk):
Following up it turns out to be *very* simple to do the security manager 
trick

{code:java}
import java.io.File;

public class Test {

public static void main(String... args) {
System.setSecurityManager(new SecurityManager() {
public void checkDelete(String file) throws SecurityException {
File fp = new File(file);
String path = fp.getAbsolutePath();

if (!path.startsWith(/tmp)) {
throw new SecurityException(Bang!);
}
}
});

new File(/home/greg/test).delete();
}
}
{code}

{code}
Exception in thread main java.lang.SecurityException: Bang!
at Test$1.checkDelete(Test.java:12)
at java.io.File.delete(File.java:971)
at Test.main(Test.java:17)
{code}
  
 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4337) Create Java security manager for forcible asserting behaviours in testing

Greg Bowyer created LUCENE-4337:
---

 Summary: Create Java security manager for forcible asserting 
behaviours in testing
 Key: LUCENE-4337
 URL: https://issues.apache.org/jira/browse/LUCENE-4337
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.1, 5.0, 4.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer


Following on from conversations about mutation testing, there is an interest in 
building a Java security manager that is able to assert / guarantee certain 
behaviours 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4332) Integrate PiTest mutation coverage tool into build


[ 
https://issues.apache.org/jira/browse/LUCENE-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443513#comment-13443513
 ] 

Greg Bowyer commented on LUCENE-4332:
-

I can codify a security manager, they are somewhat complex but I see our needs 
here as very simple (essentially assert file paths)

 Integrate PiTest mutation coverage tool into build
 --

 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
  Labels: build
 Attachments: 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch, 
 LUCENE-4332-Integrate-PiTest-mutation-coverage-tool-into-build.patch


 As discussed briefly on the mailing list, this patch is an attempt to 
 integrate the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Test coverage or testing the tests

2012-08-27 Thread Greg Bowyer

Hi all

At my current $DAYJOB we have been having a bit of success with an
alternative coverage tool called pit-test ( http://pitest.org/).
Essentially pit-test is a mutation testing tool that attempts to see how
well the unit tests are able to catch alterations and regressions in the
code that they aim to test; this is done by determining what code the
test actually touch and then mutating that code in some fashion.

I have as such started working on seeing if I can integrate pit-test
into the lucene build, the tool itself is apache licensed which solves
that particular issue.

The main downsides I can see is that the coverage might be different
across runs due to lucenes random testing, as well as the time it takes
to run coverage (~4 hours on a big bad sandy bridge machine).

I have published initial results for lucene packages, core is missing
because there is one test in core that does not work with pit-test
currently (and as such I am currently working on getting core generated.

It can be found here http://people.apache.org/~gbowyer/pitest/
http://people.apache.org/%7Egbowyer/pitest/

Is this of interest to anyone other than myself, especially given the
aggressive nature of how lucene is tested ?

-- Greg

Re: Test coverage or testing the tests

2012-08-27 Thread Greg Bowyer

On 27/08/12 17:30, Chris Hostetter wrote:
 : This is cool. I'd say lets get it up and going on jenkins (even weekly
 : or something). why worry about the imperfections in any of these
 : coverage tools, whats way more important is when the results find
 : situations where you thought you were testing something, but really

 +1.

 Even if it hammers the machine so bad it can't be run on mortal 
 hardware, it's still worth it to hook it into the build system so people 
 with god like hardware can easily run it and file bugs based on what 
 they see.

 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

The machine I ran it on cost me $5 from ec2 :D


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4332) Integrate PiTest mutation coverage tool into build

2012-08-27 Thread Greg Bowyer (JIRA)

Greg Bowyer created LUCENE-4332:
---

 Summary: Integrate PiTest mutation coverage tool into build
 Key: LUCENE-4332
 URL: https://issues.apache.org/jira/browse/LUCENE-4332
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.1, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer


As discussed briefly on the mailing list, this patch is an attempt to integrate 
the PiTest mutation coverage tool into the lucene build

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3572) Make Schema-Browser show custom similarities

2012-08-14 Thread Greg Bowyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Bowyer resolved SOLR-3572.
---

   Resolution: Fixed
Fix Version/s: 4.0

Catching up with these things 

committed to trunk: 1373117
committed to branch_4x: 1373146

 Make Schema-Browser show custom similarities
 

 Key: SOLR-3572
 URL: https://issues.apache.org/jira/browse/SOLR-3572
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Fix For: 4.0

 Attachments: SOLR-3572-similarity-schemabrowser.patch


 When a custom similarity is defined in the solr schema it is helpful to have 
 the schema browser show the custom similarity

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3673) Random variate functions


[ 
https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422029#comment-13422029
 ] 

Greg Bowyer commented on SOLR-3673:
---

{quote}
This is where my total ignorance of these random generators and how they use 
comes in: it looked to me like these generators in your patch just took in a 
java.util.Random as input – is there a particular reason why this Mrs. Twister 
random needs to be used? what does that give us that java.util.Random doesn't?
{quote}

They can take anything that extends java.util.Random, the only issue that 
exists with the inbuilt one is that its chance of repeating itself is 
outstandingly low, it has some properties with the numbers it generates that 
make it generate that are statistically poor and its slightly slower.

I dont lay claim to being an expert on this stuff, I am going on what I have 
been told, the usage of MT is a side benefit of cheating on the distributions 
and using the ones that come out of the box in uncommons-math - since I had a 
better RNG available I used it 

{quote}
FWIW: 128bits isn't that much if you let the seed argument to the function be 
an arbitrary String - even if you ignore the high bits the user just needs to 
give you 16 chars (less if we include stuff like the index version)
{quote}

Yeah its not a lot and manageable, I was more thinking about avoiding it being 
too configurable

{quote}
This is kind of where my use case question comes into play as well ... if the 
goal is just to use these generators to get a biased shuffling of the docs 
(ie: maybe you use certain random distribution and then frange filter on it get 
a set of documents with a roughly predictable size) then it's not that bad if 
the seeds aren't very complex – throw in the SolrCore start time to get a few 
more bits, etc But if there is some sort of cryptography goal then 
obviously having a good random seed that is unpredictable is a lot more 
important.
{quote}

The first use case, also use cases involving bending things towards 
distributions to act as cheap models. 

This stuff is useless as it stands for crypto anyhow since these RNG's are 
fairly predictable.

 Random variate functions
 

 Key: SOLR-3673
 URL: https://issues.apache.org/jira/browse/SOLR-3673
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: SOLR-3673.patch


 Hi all
 At my $DAYJOB I have been asked to build a few random variate functions that 
 return random numbers bound to a distribution.
 I think these can be added to solr.
 I have a hesitation in that the code as written uses / needs uncommons math 
 (because we want a far better RNG than java's and because I am lazy and did 
 not want to write distributions)
 uncommons math is apache license so we are good on that front
 anyone have any thoughts on this ?
 For reference the functions are:
 rgaussian(mean, stddev) - Random value aligned to gaussian distribution
 rpoisson(mean) - Random value aligned to poisson distribution
 rbinomial(n, prob) - Random value aligned to binomial distribtion
 rcontinous(min ,max) - random continuous value between min and max
 rdiscrete(min, max) - Random discrete value between min and max
 rexponential(rate) - Random value from the exponential distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-3673) Random variate functions


[ 
https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422029#comment-13422029
 ] 

Greg Bowyer edited comment on SOLR-3673 at 7/25/12 6:24 AM:


{quote}
This is where my total ignorance of these random generators and how they use 
comes in: it looked to me like these generators in your patch just took in a 
java.util.Random as input – is there a particular reason why this Mrs. Twister 
random needs to be used? what does that give us that java.util.Random doesn't?
{quote}

They can take anything that extends java.util.Random, the only issue that 
exists with the inbuilt one is that its chance of repeating itself is 
outstandingly low, it has some properties with the numbers it generates that 
make it generate that are statistically poor and its slightly slower.

I dont lay claim to being an expert on this stuff, I am going on what I have 
been told, the usage of MT is a side benefit of cheating on the distributions 
and using the ones that come out of the box in uncommons-math - since I had a 
better RNG available I used it 

{quote}
FWIW: 128bits isn't that much if you let the seed argument to the function be 
an arbitrary String - even if you ignore the high bits the user just needs to 
give you 16 chars (less if we include stuff like the index version)
{quote}

Yeah its not a lot and manageable, I was more thinking about avoiding it being 
too configurable (for example I think saying rguassian(1, 0.5, some very long 
seed with lots of data, XORShift)  would be too far)

I will implement the passing in of a seed value for sure (thats pretty 
sensible), I was more worried about making sure that the seed was just random 
(ha!) data the user passed in, and that there is not expectation over whats 
happening under the hood. 

{quote}
This is kind of where my use case question comes into play as well ... if the 
goal is just to use these generators to get a biased shuffling of the docs 
(ie: maybe you use certain random distribution and then frange filter on it get 
a set of documents with a roughly predictable size) then it's not that bad if 
the seeds aren't very complex – throw in the SolrCore start time to get a few 
more bits, etc But if there is some sort of cryptography goal then 
obviously having a good random seed that is unpredictable is a lot more 
important.
{quote}

The first use case, also use cases involving bending things towards 
distributions to act as cheap models. 

This stuff is useless as it stands for crypto anyhow since these RNG's are 
fairly predictable.

  was (Author: gbow...@fastmail.co.uk):
{quote}
This is where my total ignorance of these random generators and how they use 
comes in: it looked to me like these generators in your patch just took in a 
java.util.Random as input – is there a particular reason why this Mrs. Twister 
random needs to be used? what does that give us that java.util.Random doesn't?
{quote}

They can take anything that extends java.util.Random, the only issue that 
exists with the inbuilt one is that its chance of repeating itself is 
outstandingly low, it has some properties with the numbers it generates that 
make it generate that are statistically poor and its slightly slower.

I dont lay claim to being an expert on this stuff, I am going on what I have 
been told, the usage of MT is a side benefit of cheating on the distributions 
and using the ones that come out of the box in uncommons-math - since I had a 
better RNG available I used it 

{quote}
FWIW: 128bits isn't that much if you let the seed argument to the function be 
an arbitrary String - even if you ignore the high bits the user just needs to 
give you 16 chars (less if we include stuff like the index version)
{quote}

Yeah its not a lot and manageable, I was more thinking about avoiding it being 
too configurable

{quote}
This is kind of where my use case question comes into play as well ... if the 
goal is just to use these generators to get a biased shuffling of the docs 
(ie: maybe you use certain random distribution and then frange filter on it get 
a set of documents with a roughly predictable size) then it's not that bad if 
the seeds aren't very complex – throw in the SolrCore start time to get a few 
more bits, etc But if there is some sort of cryptography goal then 
obviously having a good random seed that is unpredictable is a lot more 
important.
{quote}

The first use case, also use cases involving bending things towards 
distributions to act as cheap models. 

This stuff is useless as it stands for crypto anyhow since these RNG's are 
fairly predictable.
  
 Random variate functions
 

 Key: SOLR-3673
 URL: https://issues.apache.org/jira/browse/SOLR-3673
 Project: Solr
  Issue Type: Improvement

[jira] [Commented] (SOLR-3673) Random variate functions


[ 
https://issues.apache.org/jira/browse/SOLR-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422451#comment-13422451
 ] 

Greg Bowyer commented on SOLR-3673:
---

Good idea, although interestingly I have noticed (from some brief source 
diving) that mahout actual embeds uncommons math :S

As for secure random, its aimed at crypto and so reads from high quality 
entropy like /dev/random, whilst this can be changed around it gets more 
complex than it needs to be.

 Random variate functions
 

 Key: SOLR-3673
 URL: https://issues.apache.org/jira/browse/SOLR-3673
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0, 5.0
Reporter: Greg Bowyer
Assignee: Greg Bowyer
 Attachments: SOLR-3673.patch


 Hi all
 At my $DAYJOB I have been asked to build a few random variate functions that 
 return random numbers bound to a distribution.
 I think these can be added to solr.
 I have a hesitation in that the code as written uses / needs uncommons math 
 (because we want a far better RNG than java's and because I am lazy and did 
 not want to write distributions)
 uncommons math is apache license so we are good on that front
 anyone have any thoughts on this ?
 For reference the functions are:
 rgaussian(mean, stddev) - Random value aligned to gaussian distribution
 rpoisson(mean) - Random value aligned to poisson distribution
 rbinomial(n, prob) - Random value aligned to binomial distribtion
 rcontinous(min ,max) - random continuous value between min and max
 rdiscrete(min, max) - Random discrete value between min and max
 rexponential(rate) - Random value from the exponential distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-3673) Random variate functions