[Lucene.Net] Re: Creating a ASF fork of Sharpen under a dOCL license

2011-03-26 Thread Scott Lombard
I sent an email to the db4o team to see what they think.  When I get a
response back from them we should have more answers.  At that point it
will either be a no on their end or we will have specific items to
discuss.

Scott

On Friday, March 25, 2011, Stefan Bodewig bode...@apache.org wrote:
 On 2011-03-25, Prescott Nasser wrote:

 Stefan, how do you read their licensing:

 http://www.db4o.com/about/company/legalpolicies/docl.aspx

 By your reading is it possible to include this in our repo to keep
 everything together? or would this have to be outside the ASF?

 The usual IANAL disclaimer applies and we could ask for legal clearance
 if we absolutely think we need it.

 From a cursory glance I don't think the policy applies to our use-case
 at all.

 ,
 | 1. Subject
 |
 | Software means the current version of the db4o database engine
 | software and all patches, bug fixes, error corrections and future
 | versions.
 `

 AFAIU Sharpen is not part of the database engine.

 and in addition I'm not sure that a fork of the codebase is in line with
 what they'd consider a derivative work.

 Even if it would apply, the license to the original code base was
 non-transferable and you'd only get the right to sublicense the original
 code base under the rules of the GPL (section 2b - in addition there is
 no software at all prior to accepting the agreement).  I don't see how
 this could work.

 If you really feel that forking Sharpen is the best way to move forward
 - not my call to make - then forking the GPLed sources into a project
 that is using the GPL itself seems to be the only choice that was
 legally sane.

 Stefan



Re: add an afterFilter to IndexSearcher.search()

2011-03-26 Thread Michael McCandless
On Fri, Mar 25, 2011 at 2:33 PM, Yonik Seeley
yo...@lucidimagination.com wrote:

 Currently, supplying a filter to IndexSearcher.search() assumes that
 it's cheaper to run than the main query.

 Wait, where do we assume that?

 After a match, we always skip on the filter first.

Well, we next() on the filter, and advance() on the scorer.

 Also, why stop at 2 filters?  Ie I may have 3 filters plus a query to
 AND, and I want to control their order.

 Multiple filters could be combined into a single one via ChainedFilter, etc.

True.

 What's the use case behind this...?

 Optimizing cases where filters might be more expensive than the main query ;-)

That much I understood ;)

But you must have a real use case... that inspired this idea?  Where
are apps/Solr typically using such expensive filters?

Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-236) Field collapsing

2011-03-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011603#comment-13011603
 ] 

Grant Ingersoll commented on SOLR-236:
--

Keep in mind an alternative approach that scales, but loses some attributes of 
this patch (total groups for instance) is committed on trunk and will likely be 
backported to 3.2.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: Next

 Attachments: DocSetScoreCollector.java, 
 NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
 SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
 SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
 SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
 collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
 collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, solr-236.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RC Status

2011-03-26 Thread Grant Ingersoll
I ran into a few kinks w/ signing artifacts (it wasn't finding the maven 
artifacts) in Solr and am fixing them.  Once that goes through, I will upload 
an RC
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2990) Improve ArrayUtil/CollectionUtil.*Sort() methods to early-reaturn on empty or one-element lists/arrays

2011-03-26 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2990.
---

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Renamed local variable from l to size.

Committed trunk revision: 1085689
Committed 3.x revision: 1085691

 Improve ArrayUtil/CollectionUtil.*Sort() methods to early-reaturn on empty or 
 one-element lists/arrays
 --

 Key: LUCENE-2990
 URL: https://issues.apache.org/jira/browse/LUCENE-2990
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2990.patch, LUCENE-2990.patch, LUCENE-2990.patch


 It might be a good idea to make CollectionUtil or ArrayUtil return early if 
 the passed-in list or array's length = 1 because sorting is unneeded then. 
 This improves maybe automaton or other places, as for empty or one-element 
 lists no SorterTermplate is created.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch

2011-03-26 Thread Uwe Schindler (JIRA)
When 3.1 is released, update backwards tests in 3.x branch
--

 Key: LUCENE-2994
 URL: https://issues.apache.org/jira/browse/LUCENE-2994
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler


When we have released the official artifacts of Lucene 3.1 (the final ones!!!), 
we need to do the following:

- svn rm backwards/src/test
- svn cp 
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test
 backwards/src/test
- Copy the lucene-core-3.1.0.jar from the last release tarball to 
lucene/backwards/lib and delete old one.
- Check that everything is correct: The backwards folder should contain a src/ 
folder that now contains test. The files should be the ones from the branch.
- Run ant test-backwards

Uwe will take care of this!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch

2011-03-26 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2994:
--

Affects Version/s: 3.2
Fix Version/s: 3.2

 When 3.1 is released, update backwards tests in 3.x branch
 --

 Key: LUCENE-2994
 URL: https://issues.apache.org/jira/browse/LUCENE-2994
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.2
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2


 When we have released the official artifacts of Lucene 3.1 (the final 
 ones!!!), we need to do the following:
 - svn rm backwards/src/test
 - svn cp 
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test
  backwards/src/test
 - Copy the lucene-core-3.1.0.jar from the last release tarball to 
 lucene/backwards/lib and delete old one.
 - Check that everything is correct: The backwards folder should contain a 
 src/ folder that now contains test. The files should be the ones from the 
 branch.
 - Run ant test-backwards
 Uwe will take care of this!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch

2011-03-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011610#comment-13011610
 ] 

Uwe Schindler commented on LUCENE-2994:
---

We have to also clone the 3.1 test-framework, so it's a little bit more work, 
but it should be easy to do.

 When 3.1 is released, update backwards tests in 3.x branch
 --

 Key: LUCENE-2994
 URL: https://issues.apache.org/jira/browse/LUCENE-2994
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.2
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2


 When we have released the official artifacts of Lucene 3.1 (the final 
 ones!!!), we need to do the following:
 - svn rm backwards/src/test
 - svn cp 
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test
  backwards/src/test
 - Copy the lucene-core-3.1.0.jar from the last release tarball to 
 lucene/backwards/lib and delete old one.
 - Check that everything is correct: The backwards folder should contain a 
 src/ folder that now contains test. The files should be the ones from the 
 branch.
 - Run ant test-backwards
 Uwe will take care of this!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011613#comment-13011613
 ] 

Robert Muir commented on SOLR-2155:
---

I don't really think things like this (queries etc) should go into just Solr, 
while we leave the lucene-contrib spatial package broken.

Lets put things in the right places?

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011615#comment-13011615
 ] 

Grant Ingersoll commented on SOLR-2155:
---

Yeah, I agree.  I haven't looked at the patch yet.  It was my understanding 
that Chris Male was going to move lucene/contrib/spatial to modules and gut the 
broken stuff in it.  I think there is a separate issue open for that one.  
Presumably, once spatial and function queries are moved to modules, then we 
will have a properly working spatial package.

I obviously can move it, but I don't have time to do the gutting (we really 
should have deprecated the tier stuff for this release).

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616
 ] 

Robert Muir commented on SOLR-2155:
---

well what would the deprecation have suggested as an alternative?

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011619#comment-13011619
 ] 

Chris Male commented on SOLR-2155:
--

In LUCENE-2599 I deprecated the spatial contrib.  The problem is as Robert 
raises, deprecating the code without providing an alternative isn't that user 
friendly.  I think as part of this issue we should start up the spatial module 
and work towards moving what we can there.  Moving function queries is going to 
take some time since they are very coupled to Solr.  But that shouldn't 
preclude us from putting into the module what we can.  Once we have a module 
that provides a reasonable set of functionality, then we can 
deprecate/gut/remove the spatial contrib.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll
Not really related to this issue, so moving to dev@...

On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote:

 
[ 
 https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616
  ] 
 
 Robert Muir commented on SOLR-2155:
 ---
 
 well what would the deprecation have suggested as an alternative?

It's a good question.  The tier stuff, IMO and confirmed by others is broken 
for most of the world.  I sunk a good week into fixing it and was so entangled 
in the spaghetti that I gave up.  What we laid out on another issue (I forget 
the number, but I think C Male owns it and says he has a rewrite) is to move to 
modules, keep what we can (geohash and some of the utils) and gut the rest.  
That combined w/ moving function queries to modules would make all of spatial a 
good solution for the large majority of users.  The only thing that would 
remain to be back to our current state (at least in terms of features) would be 
to implement a tier approach.  I've proposed the Military Grid System (there is 
an open JIRA issue for it) as something that looks to be as a good candidate.  
It's well documented on the web and uses a metric for all distances and has the 
benefit that all of NATO uses it, albeit for different purposes.  It also 
addresses the poles and the meridians as first class citizens.  It just needs 
an implementer.  Having said that, I'm not 100% certain.  I also don't know 
that the tier stuff is absolutely necessary.  The combination of what we have 
in function queries plus trie fields makes for a very fast spatial lookup at 
this point.

I'm totally open to other suggestions, however.

Longer term, I've got a lot of ideas for spatial, but that's a different thread.

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-1298) FunctionQuery results as pseudo-fields

2011-03-26 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-1298:
--

Assignee: Yonik Seeley  (was: Grant Ingersoll)

 FunctionQuery results as pseudo-fields
 --

 Key: SOLR-1298
 URL: https://issues.apache.org/jira/browse/SOLR-1298
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
Priority: Minor
 Fix For: Next

 Attachments: SOLR-1298-FieldValues.patch, SOLR-1298.patch


 It would be helpful if the results of FunctionQueries could be added as 
 fields to a document. 
 Couple of options here:
 1. Run FunctionQuery as part of relevance score and add that piece to the 
 document
 2. Run the function (not really a query) during Document/Field retrieval

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Yonik Seeley
On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

I disagree strongly with the sentiment that queries don't belong in Solr.
Everything developed in/for lucene need not be exported to Solr immediately.
Everything developed in/for solr need not be exported to Lucene immediately.

If the work has been done, and the patch works for Solr, that should
be enough.  Period.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll

On Mar 26, 2011, at 8:24 AM, Robert Muir wrote:

 On Sat, Mar 26, 2011 at 8:06 AM, Grant Ingersoll gsing...@apache.org wrote:
 Not really related to this issue, so moving to dev@...
 
 On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote:
 
 
[ 
 https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616
  ]
 
 Robert Muir commented on SOLR-2155:
 ---
 
 well what would the deprecation have suggested as an alternative?
 
 It's a good question.  The tier stuff, IMO and confirmed by others is broken 
 for most of the world.  I sunk a good week into fixing it and was so 
 entangled in the spaghetti that I gave up.  What we laid out on another 
 issue (I forget the number, but I think C Male owns it and says he has a 
 rewrite) is to move to modules, keep what we can (geohash and some of the 
 utils) and gut the rest.  That combined w/ moving function queries to 
 modules would make all of spatial a good solution for the large majority of 
 users.  The only thing that would remain to be back to our current state (at 
 least in terms of features) would be to implement a tier approach.  I've 
 proposed the Military Grid System (there is an open JIRA issue for it) as 
 something that looks to be as a good candidate.  It's well documented on the 
 web and uses a metric for all distances and has the benefit that all of NATO 
 uses it, albeit for different purposes.  It also addresses the poles and the 
 meridians as first class citizens.  It just needs an implementer.  Having 
 said that, I'm not 100% certain.  I also don't know that the tier stuff is 
 absolutely necessary.  The combination of what we have in function queries 
 plus trie fields makes for a very fast spatial lookup at this point.
 
 I'm totally open to other suggestions, however.
 
 Longer term, I've got a lot of ideas for spatial, but that's a different 
 thread.
 
 
 I guess the reason I asked my question is more high-level: on one hand
 there are suggestions that lucene's spatial package should have been
 deprecated in 3.1, but on the other hand the very first feature on
 solr 3.1's new feature list is 'improved geospatial support'.
 

It really should say: Added Geospatial Support, as it was non-existent in Solr 
before.

Most of the work for adding in spatial in Solr consisted of improving things in 
Solr to make it easy to leverage the one spatial feature we really added: 
distance based functions and parsing support.  Everything else was generally 
useful things: sorting by function, poly fields, etc.  I started on tier 
support, but dropped it when I realized it was broken beyond repair.  The Solr 
stuff uses, IMO, the stuff in Lucene that works and ignores the rest.  I seem 
to recall Chris had said that once I got done w/ the Solr stuff he would do the 
modules work, but it hasn't happened yet.

I'd say in 3.2, since it sounds like Chris did at least deprecate 
contrib/spatial, that we work to get all of this resolved:  spatial - modules, 
function queries - modules.  Naturally we should do it on trunk, too.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll

On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote:

 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr
 
 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.
 
 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.
 

I agree it's enough for the contributor to do that, but as committers we need 
to look at the bigger picture in this particular case, which is the move of 
spatial to modules.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Yonik Seeley
On Sat, Mar 26, 2011 at 9:57 AM, Grant Ingersoll gsing...@apache.org wrote:

 On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote:

 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.


 I agree it's enough for the contributor to do that, but as committers we need 
 to look at the bigger picture in this particular case, which is the move of 
 spatial to modules.

That's a separate asynchronous issue.
Progress should not be blocked in Solr in the meantime.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Robert Muir
On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.


Its not enough for me: you can expect me to start raising questions
and objections when things are committed to the wrong place in the
codebase, its totally appropriate. We merged development, all
committers can commit to the correct places, there are no excuses.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Nicolas Helleringer

  I started on tier support, but dropped it when I realized it was broken
 beyond repair.


I did no know one could break code beyond repair

Nicolas


Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Yonik Seeley
On Sat, Mar 26, 2011 at 10:05 AM, Robert Muir rcm...@gmail.com wrote:
 On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.


 Its not enough for me: you can expect me to start raising questions
 and objections when things are committed to the wrong place in the
 codebase, its totally appropriate. We merged development, all
 committers can commit to the correct places, there are no excuses.

If you're saying Queries don't belong in Solr, I'm a huge -1 on that.
There's no correct place for queries in general - it's all in the context.
If there's a better place for the query that can be achieved with a
mv, then fine.
But there's often much more work involved, dependencies on other solr features,
or fleshing out a real Java API rather than treating something as
simple implementation.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2396) add [ICU]CollationField

2011-03-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned SOLR-2396:
-

Assignee: Robert Muir

 add [ICU]CollationField
 ---

 Key: SOLR-2396
 URL: https://issues.apache.org/jira/browse/SOLR-2396
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch, 
 SOLR-2396.patch


 In LUCENE-2551 collation support was changed to use byte[] keys.
 Previously it encoded sort keys with IndexableBinaryString into char[],
 but this is wasteful with regards to RAM and disk when terms can be byte.
 A better solution would be [ICU]CollationFieldTypes, as this would also allow 
 locale-sensitive range queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2396) add [ICU]CollationField

2011-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011646#comment-13011646
 ] 

Robert Muir commented on SOLR-2396:
---

I'd like to commit this in a few days if no one objects. The existing encoding 
is wasteful and I would like to cut solr over to this more efficient one (and 
enable locale-sensitive range queries).

We could open future issues for any additional features such as specifying the 
icu locale as BCP47, etc, etc. (this just implements the lucene 3.1 
functionality more efficiently) 


 add [ICU]CollationField
 ---

 Key: SOLR-2396
 URL: https://issues.apache.org/jira/browse/SOLR-2396
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch, 
 SOLR-2396.patch


 In LUCENE-2551 collation support was changed to use byte[] keys.
 Previously it encoded sort keys with IndexableBinaryString into char[],
 but this is wasteful with regards to RAM and disk when terms can be byte.
 A better solution would be [ICU]CollationFieldTypes, as this would also allow 
 locale-sensitive range queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011653#comment-13011653
 ] 

David Smiley commented on SOLR-2155:


I plan to finish a couple improvements to this patch within 2 weeks time: 
distance function queries to work with multi-value, and polygon queries that 
span the date line.  I've been delayed by some life events (new baby).  
Furthermore, I'll try and ensure that the work here is applicable to pure 
Lucene users (i.e. sans Solr). 

One thing I'm unsure of is how to integrate (or not integrate) existing Lucene 
 Solr spatial code with this patch.  In this patch I chose to re-use some 
basic shape classes in Lucene's spatial contrib simply because they were 
already there, but I could just as easily of not.  My preference going forward 
would be to outright replace Lucene's spatial contrib with this patch.  I also 
think LatLonType and PointType could become deprecated since this patch is not 
only more capable (multiValue support) but faster too.  Well with filtering, 
sorting is TBD. I'm also inclined to name the field type LatLonGeohashType to 
re-enforce the fact that it works with lat  lon; geohash is an implementation 
detail. In the future it might even not be geohash, strictly speaking, once we 
optimize the encoding.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Ryan McKinley
FYI, I'm working on revamping lucene spatial in general
https://lucene-spatial-playground.googlecode.com/svn/trunk/
http://code.google.com/p/lucene-spatial-playground/

These are just sketch APIs for now, but i hope to get them cleaned up
and contributed soon.

The proposal will be for 3 packages in /modules
1. spatial stuff w/o lucene dependencies -- shapes, distances, etc
2. lucene support for these types
3. solr support for the lucene stuff
(4) demo, probably keep this as an external project since UI and demo
stuff is much easier on the outside.

I hope to migrate the existing spatial stuff to this structure and
remove the not-really-working stuff.

I'll post more when things are closer to commitable.

ryan

On Sat, Mar 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote:
 On Sat, Mar 26, 2011 at 11:03 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just 
 Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.

 This is an important enough point that I'm going to follow it up with
 a quote from Mike:

 The combined dev community would have no requirement/expectation that
 if someone adds something cool to Lucene they must also expose it in
 Solr. There will still be devs that wear mostly Solr vs most Lucene
 hats. There will also be devs that comfortably wear both. There will
 be devs that focus on analyzers and do amazing things ;)

 We merged to *enable* moving code around easier, not to mandate it.
 It is wrong to object to a patch because someone hasn't done extra
 work with their solr hat on to enable it's use in solr.
 It is wrong to object to a patch because someone hasn't done extra
 work with their lucene hat on enable it's use in lucene.


 With that out of the way, let's get more specific: what Query in
 this patch should be moved, and to where?


 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011661#comment-13011661
 ] 

Ryan McKinley commented on SOLR-2155:
-

Congratulations on the new baby!

Thinking about spatial support in general, I think we should settle on some 
basic APIs and approaches that can be used across many indexing strategies.  In 
http://code.google.com/p/lucene-spatial-playground/ I'm messing with how we can 
use a standard API to index Shapes with various strategies.  As always, each 
stratagey has its tradeoffs, but if we can keep the high level APIs similar, 
that makes choosing the right approach easier.  In this project I'm looking at 
indexing shaps as:
 * bounding box -- 4 fields xmin/xmax/ymin./ymax
 * prefix grids -- like geohash or 
[csquars|http://www.marine.csiro.au/csquares/about-csquares.htm]
 * in memory spatial index (rtree/quadtree)
 * raw WKB geometry tokens
 * points -- x,y fields
 * etc

To keep things coherent, I'm proposing a high level interface like:
https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/SpatialQueryBuilder.java

And then each implementation fills it in:
https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/prefix/PrefixGridQueryBuilder.java

This solr to just handle setup and configuration:
http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-solr/src/main/java/org/apache/solr/spatial/prefix/SpatialPrefixGridFieldType.java

In my view geohash is a subset of 'spatial prefix grid' (is there a real name 
for this?) -- the interface i'm proposing is:
http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-base/src/main/java/org/apache/lucene/spatial/base/prefix/SpatialPrefixGrid.java
essentially:
{code}
  public ListCharSequence readCells( Shape geo );
{code}

Geohash for a point would just be a list of one token -- for a polygon, it 
would be a collection of tokens that fill the space like csquares

I aim to get this basic structure in a lucene branch and maybe into trunk in 
the next few weeks


 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Yonik Seeley
On Sat, Mar 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote:
 On Sat, Mar 26, 2011 at 11:03 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just 
 Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.

 This is an important enough point that I'm going to follow it up with
 a quote from Mike:

 The combined dev community would have no requirement/expectation that
 if someone adds something cool to Lucene they must also expose it in
 Solr. There will still be devs that wear mostly Solr vs most Lucene
 hats. There will also be devs that comfortably wear both. There will
 be devs that focus on analyzers and do amazing things ;)

 We merged to *enable* moving code around easier, not to mandate it.
 It is wrong to object to a patch because someone hasn't done extra
 work with their solr hat on to enable it's use in solr.
 It is wrong to object to a patch because someone hasn't done extra
 work with their lucene hat on enable it's use in lucene.


 With that out of the way, let's get more specific: what Query in
 this patch should be moved, and to where?


 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

There need not be any linkage to lucene to improve a Solr feature.
If you disagree, we should vote to clarify - this is too important
(and too much of a negative for Solr).

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Ryan McKinley

 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 There need not be any linkage to lucene to improve a Solr feature.
 If you disagree, we should vote to clarify - this is too important
 (and too much of a negative for Solr).


I don't think there is *requirement* to move the core spatial stuff to
lucene, but I think there is huge benefit to both communities if
things have as few dependencies as possible.  To be frank, the spatial
support in solr is pretty hairy -- it works for some use cases, but is
not extendable and quite basic.  Calling it 'distance' seems more
appropriate then 'spatial'

For good spatial support, I think we want to organize things with as
few dependencies/assumptions as possible.  This will let:
 * only basic math/geometry -- anything complex should use existing
well tested solid frameworks (JTS/proj4/geotools/etc) we should not be
reinventing/retesting this stuff.  We need basic APIs that will work
well with these external tools
 * lucene focus on fields and queries
 * solr focus on configuration and external interface

This structure and constraints would be a big win for everyone.

As always this stuff is hard to talk about in the abstract w/o a real
proposal -- of course fixing/improving solr features does not
*require* working in lucene-core.  But I think we get better solutions
when we aim for modular designs with minimum dependencies.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Yonik Seeley
On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:

 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 There need not be any linkage to lucene to improve a Solr feature.
 If you disagree, we should vote to clarify - this is too important
 (and too much of a negative for Solr).


 I don't think there is *requirement* to move the core spatial stuff to
 lucene, but I think there is huge benefit to both communities if
 things have as few dependencies as possible.  To be frank, the spatial
 support in solr is pretty hairy -- it works for some use cases, but is
 not extendable and quite basic.  Calling it 'distance' seems more
 appropriate then 'spatial'

Having something basic that works (and has a clean enough high level
HTTP interface)
was clearly a win for Solr users.  The


 For good spatial support, I think we want to organize things with as
 few dependencies/assumptions as possible.  This will let:
  * only basic math/geometry -- anything complex should use existing
 well tested solid frameworks (JTS/proj4/geotools/etc) we should not be
 reinventing/retesting this stuff.  We need basic APIs that will work
 well with these external tools
  * lucene focus on fields and queries
  * solr focus on configuration and external interface

 This structure and constraints would be a big win for everyone.

 As always this stuff is hard to talk about in the abstract w/o a real
 proposal -- of course fixing/improving solr features does not
 *require* working in lucene-core.  But I think we get better solutions
 when we aim for modular designs with minimum dependencies.

 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Yonik Seeley
On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:

 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 There need not be any linkage to lucene to improve a Solr feature.
 If you disagree, we should vote to clarify - this is too important
 (and too much of a negative for Solr).


 I don't think there is *requirement* to move the core spatial stuff to
 lucene, but I think there is huge benefit to both communities if
 things have as few dependencies as possible.  To be frank, the spatial
 support in solr is pretty hairy -- it works for some use cases, but is
 not extendable and quite basic.  Calling it 'distance' seems more
 appropriate then 'spatial'


Having something basic that works (and has a clean enough high level
HTTP interface) was clearly a win for Solr users.

Of course a more fully featured spatial module would be a win for
everyone, but that's ignoring the more generic issue at hand here:
a patch that improves Solr's spatial
should not be blocked on the grounds that it does not improve Lucene's
spatial enough.

Likewise, the ridiculous notion that Queries don't belong in Solr
needs to be put to rest.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [solr] DataSource for HBase Tables?

2011-03-26 Thread Grant Ingersoll
Yes, if you are going to use the Data Import Handler, I would say that is the 
route to go.  You might also look at using an abstraction like Gora instead of 
having a dependency directly on HBase.


On Mar 25, 2011, at 4:32 PM, Sterk, Paul (Contractor) wrote:

 Hi,
  
 I have a requirement to use Solr to import data from an HBase table and index 
 the contents – similar to importing data from a RDBMS.  It looks like I will 
 need to create an org.apache.solr.handler.dataimport.DataSourceT
 implementation for HBase to be used by the Data Import Handler.
  
 Is this the correct approach?  If it is, has someone created a DataSource 
 implementation for HBase?
  
 Paul
  
  
 This message, including any attachments, is the property of Sears Holdings 
 Corporation and/or one of its subsidiaries. It is confidential and may 
 contain proprietary or legally privileged information. If you are not the 
 intended recipient, please delete it without reading the contents. Thank you.

--
Grant Ingersoll
http://www.lucidimagination.com



Re: Interested in GSOC

2011-03-26 Thread Vinicius Barrox
Thanks for the tips. I'm going through the code and javadocs right now, I will 
let you know when I have any doubts.
I'm not sure to which part of Lucene I'm intending to right a proposal yet, but 
search/query and query parsing sounds interesting.

On Fri, Mar 25, 2011 at 7:15 PM, Adriano 
Crestani adrianocrest...@gmail.com wrote:
Hi Vinicius,
Welcome to Lucene!
I think a good place to look for internal design documentation is the javadoc 
package summary. Here is an example: [1], each package usually has its own 
detailed summary.
I hope it helps ;)
[1] 
- http://lucene.apache.org/java/3_0_3/api/contrib-queryparser/org/apache/lucene/queryParser/core/package-summary.html

On Fri, Mar 25, 2011 at 4:21 AM, Simon 
Willnauer simon.willna...@googlemail.com wrote:
Hey there,

welcome to Lucene :), good to hear you are interested in Lucene and GSoC!

On Fri, Mar 25, 2011 at 4:49 AM, Vinicius Paes de barros
viniciuspaesdebar...@yahoo.com.br wrote:

 Hi there,
 I heard about GSOC from a friend of mine at college and I decide I want to 
 participate this year. I already used Lucene before, so Lucene sounds like a 
 good place to start.
 I went through the JIRA projects, but I couldn't find something I feel like 
 writing a proposal to, maybe I don't have enough knowledge yet about how 
 Lucene is implemented internally. So I started looking at the wiki, but I'm 
 not sure whether it contains all the info I need.
 Is there any other place I should be looking at to learn more about Lucene's 
 internal design?
We don't have a lot of design documents and if there are any they
might be most likely outdated. I think the best documentation is the
code and the people who have written it. If you wanna dive into lucene
you should ask as many questions you need to ask and get all the info
out of us. We are usually around every day depending on the timezones
though so you either go and write emails or you join our IRC channel
#lucene on freenode (http://lucene.apache.org/java/docs/irc.html)
Is there anything particular that you are interested in like indexing,
search, analysis etc?

simon

 Thanks in advance,
 Vinicius Barros


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





  

[jira] [Created] (LUCENE-2995) factor out a shared spellchecking module

2011-03-26 Thread Robert Muir (JIRA)
factor out a shared spellchecking module


 Key: LUCENE-2995
 URL: https://issues.apache.org/jira/browse/LUCENE-2995
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0


In lucene's contrib we have spellchecking support (index-based spellchecker, 
directspellchecker, etc). 
we also have some things like pluggable comparators.

In solr we have auto-suggest support (with two implementations it looks like), 
some good utilities like HighFrequencyDictionary, etc.

I think spellchecking is really important... google has upped the ante to what 
users expect.
So I propose we combine all this stuff into a shared modules/spellchecker, 
which will make it easier
to refactor and improve the quality.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2995) factor out a shared spellchecking module

2011-03-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2995:


Attachment: LUCENE-2995.patch

Just a quick shot at this (all tests pass).

Really any serious 'refactoring' e.g. perf improvements should be on followup 
issues I think.

before applying the patch, run this:
{noformat}
svn move lucene/contrib/spellchecker modules
svn move solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java 
modules/spellchecker/src/java/org/apache/lucene/search/spell
svn move solr/src/java/org/apache/solr/util/TermFreqIterator.java 
modules/spellchecker/src/java/org/apache/lucene/search/spell
svn move solr/src/java/org/apache/solr/util/SortedIterator.java 
modules/spellchecker/src/java/org/apache/lucene/search/spell
svn move solr/src/java/org/apache/solr/spelling/suggest/Suggester.java 
solr/src/java/org/apache/solr/spelling
svn move solr/src/java/org/apache/solr/spelling/suggest 
modules/spellchecker/src/java/org/apache/lucene/search/spell
{noformat}

 factor out a shared spellchecking module
 

 Key: LUCENE-2995
 URL: https://issues.apache.org/jira/browse/LUCENE-2995
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2995.patch


 In lucene's contrib we have spellchecking support (index-based spellchecker, 
 directspellchecker, etc). 
 we also have some things like pluggable comparators.
 In solr we have auto-suggest support (with two implementations it looks 
 like), some good utilities like HighFrequencyDictionary, etc.
 I think spellchecking is really important... google has upped the ante to 
 what users expect.
 So I propose we combine all this stuff into a shared modules/spellchecker, 
 which will make it easier
 to refactor and improve the quality.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-03-26 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011700#comment-13011700
 ] 

Lance Norskog commented on SOLR-2382:
-

Have you tested this under threading?

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to Trunk.  I have also 
 added unit tests and verified that all existing test cases pass.  I believe 
 this 

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Chris Male
Hi,

It really should say: Added Geospatial Support, as it was non-existent in
 Solr before.

 Most of the work for adding in spatial in Solr consisted of improving
 things in Solr to make it easy to leverage the one spatial feature we really
 added: distance based functions and parsing support.  Everything else was
 generally useful things: sorting by function, poly fields, etc.  I started
 on tier support, but dropped it when I realized it was broken beyond repair.
  The Solr stuff uses, IMO, the stuff in Lucene that works and ignores the
 rest.  I seem to recall Chris had said that once I got done w/ the Solr
 stuff he would do the modules work, but it hasn't happened yet.


 I'd say in 3.2, since it sounds like Chris did at least deprecate
 contrib/spatial, that we work to get all of this resolved:  spatial -
 modules, function queries - modules.  Naturally we should do it on trunk,
 too.


Just note that I didn't not do it out of laziness.  Actually pushing stuff
into the module isn't easy since there isn't much that can be saved from
contrib, and Solr's spatial code are predominately bound to function
queries, which themselves are very coupled to Solr and that there wasn't
anything like a consensus that they should be moved.


-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl


Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Chris Male
On Sun, Mar 27, 2011 at 7:30 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:
 
  No, the question is: what justification is there for adding spatial
  support to solr-only, leaving lucene with a broken contrib module,
  versus adding it where it belongs and exposing it to solr?
 
  There need not be any linkage to lucene to improve a Solr feature.
  If you disagree, we should vote to clarify - this is too important
  (and too much of a negative for Solr).
 
 
  I don't think there is *requirement* to move the core spatial stuff to
  lucene, but I think there is huge benefit to both communities if
  things have as few dependencies as possible.  To be frank, the spatial
  support in solr is pretty hairy -- it works for some use cases, but is
  not extendable and quite basic.  Calling it 'distance' seems more
  appropriate then 'spatial'


 Having something basic that works (and has a clean enough high level
 HTTP interface) was clearly a win for Solr users.

 Of course a more fully featured spatial module would be a win for
 everyone, but that's ignoring the more generic issue at hand here:
 a patch that improves Solr's spatial
 should not be blocked on the grounds that it does not improve Lucene's
 spatial enough.


I don't think we need to see it that way, we want to improve both Solr and
Lucene's spatial support, not block either.  As you say, having a module is
a win for everyone, Solr and Lucene alike, so it seems obvious that we
should go down that path and the code in SOLR-2155 would make a great first
addition.



 Likewise, the ridiculous notion that Queries don't belong in Solr
 needs to be put to rest.


Issues in and around this seem to be coming up a lot these days (I'm
thinking FunctionQuerys too).  Sounds like something that really does need
to be openly discussed.



 -Yonik

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl


[jira] [Commented] (LUCENE-2995) factor out a shared spellchecking module

2011-03-26 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011711#comment-13011711
 ] 

Chris Male commented on LUCENE-2995:


+1

 factor out a shared spellchecking module
 

 Key: LUCENE-2995
 URL: https://issues.apache.org/jira/browse/LUCENE-2995
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2995.patch


 In lucene's contrib we have spellchecking support (index-based spellchecker, 
 directspellchecker, etc). 
 we also have some things like pluggable comparators.
 In solr we have auto-suggest support (with two implementations it looks 
 like), some good utilities like HighFrequencyDictionary, etc.
 I think spellchecking is really important... google has upped the ante to 
 what users expect.
 So I propose we combine all this stuff into a shared modules/spellchecker, 
 which will make it easier
 to refactor and improve the quality.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[VOTE] Lucene 3.1.0 RC3

2011-03-26 Thread Grant Ingersoll
Artifacts are at http://people.apache.org/~gsingers/staging_area/rc3/.  Please 
vote as you see appropriate.  Vote closes on March 29th.

I've also updated the Release To Do for both Lucene and Solr and it is 
hopefully a lot easier now to produce the artifacts as more of it is automated 
(including uploading to staging area).


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



3.1.0 Proposed Release Announcement(s)

2011-03-26 Thread Grant Ingersoll
Proposed Release Announcement (edits welcome).  Also note we can have ASF 
Marketing put out a press release if we want.

snip
March 2011, Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1 and 
Apache Solr 3.1. 

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/java
and http://www.apache.org/dyn/closer.cgi/lucene/java.  See the respective 
CHANGES.txt
file included with the release for a full list of details.

Lucene 3.1 Release Highlights
* Improved Unicode support, including Unicode 4

* ReusableAnalyzerBase make it easier to reuse TokenStreams correctly

* Protected words in stemming via KeywordAttribute

* ConstantScoreQuery now allows directly wrapping a Query

* Support for custom ExecutorService in ParallelMultiSearcher

* IndexWriterConfig.setMaxThreadStates for controls of IndexWriter threads

* Numerous performance improvements: faster exact PhraseQuery;
  natural segment merging favors segments with deletions; primary
  key lookup is faster; IndexWriter.addIndexes(Directory[]) uses
  file copy instead of merging; BufferedIndexInput does fewer bounds
  checks; compound file is dynamically turned off for large
  segments; fully deleted segments are dropped on commit; faster
  snowball analyzers (in contrib); ConcurrentMergeScheduler is more
  careful about setting priority of merge threads.

* IndexWriter is now configured with a new separate builder API
  (IndexWriterConfig).

* IndexWriter.getReader is replaced by
  IndexReader.open(IndexWriter).  In addition you can now specify
  whether deletes should be resolved when you open an NRT reader.

* MultiSearcher is deprecated; ParallelMultiSearcher has been
  absorbed directly into IndexSearcher

* CharTermAttribute replaces TermAttribute in the Analysis process

* On 64bit Windows and Solaris JVMs, MMapDirectory is now the
  default implementation (returned by FSDirectory.open).
  MMapDirectory also enables unmapping if the JVM supports it.

* New TotalHitCountCollector just counts total number of hits

* ReaderFinishedListener API enables external caches to evict
  entries once a segment is finished

Solr 3.1 Release Highlights

* Added spatial filtering, boosting and sorting capabilities

* Added extend dismax (edismax) query parser which addresses some missing
features in the dismax query parser along with some extensions

* Several more components now support distributed mode: TermsComponent, 
SpellCheckComponent

* Added an Auto Suggest component 

* Ability to sort by functions

* Support for adding documents using JSON format

* Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well 
as new analysis capabilities

* Numerous bug fixes and optimizations.

/snip
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll

On Mar 26, 2011, at 9:03 PM, Chris Male wrote:

 Hi,
 
 It really should say: Added Geospatial Support, as it was non-existent in 
 Solr before.
 
 Most of the work for adding in spatial in Solr consisted of improving things 
 in Solr to make it easy to leverage the one spatial feature we really added: 
 distance based functions and parsing support.  Everything else was generally 
 useful things: sorting by function, poly fields, etc.  I started on tier 
 support, but dropped it when I realized it was broken beyond repair.  The 
 Solr stuff uses, IMO, the stuff in Lucene that works and ignores the rest.  I 
 seem to recall Chris had said that once I got done w/ the Solr stuff he would 
 do the modules work, but it hasn't happened yet.
 
 I'd say in 3.2, since it sounds like Chris did at least deprecate 
 contrib/spatial, that we work to get all of this resolved:  spatial - 
 modules, function queries - modules.  Naturally we should do it on trunk, 
 too.
 
 Just note that I didn't not do it out of laziness.  Actually pushing stuff 
 into the module isn't easy since there isn't much that can be saved from 
 contrib, and Solr's spatial code are predominately bound to function queries, 
 which themselves are very coupled to Solr and that there wasn't anything like 
 a consensus that they should be moved.

Agreed, it's not a small task.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 3.1.0 Proposed Release Announcement(s)

2011-03-26 Thread Robert Muir
a couple quick suggestions inline:

On Sat, Mar 26, 2011 at 10:07 PM, Grant Ingersoll gsing...@apache.org wrote:
 Proposed Release Announcement (edits welcome).  Also note we can have ASF 
 Marketing put out a press release if we want.

 snip
 March 2011, Lucene 3.1 available
 The Lucene PMC is pleased to announce the release of Apache Lucene 3.1 and 
 Apache Solr 3.1.

 This release contains numerous bug fixes, optimizations, and
 improvements, some of which are highlighted below.  The release
 is available for immediate download at 
 http://www.apache.org/dyn/closer.cgi/lucene/java
 and http://www.apache.org/dyn/closer.cgi/lucene/java.  See the respective 
 CHANGES.txt
 file included with the release for a full list of details.

 Lucene 3.1 Release Highlights
 * Improved Unicode support, including Unicode 4

 * ReusableAnalyzerBase make it easier to reuse TokenStreams correctly

 * Protected words in stemming via KeywordAttribute

I might combine these into 'analysis improvements': improved unicode
support, more friendly term handling (CharTermAttribute), easier
object reuse (ReusableAnalyzerBase), protected words in stemming
(KeywordAttribute)


 * ConstantScoreQuery now allows directly wrapping a Query

 * Support for custom ExecutorService in ParallelMultiSearcher

I think we should drop this from the release notes, especially given a
couple notes down we mention how PMS is deprecated (instead pass the
executorservice to indexsearcher).


 * IndexWriterConfig.setMaxThreadStates for controls of IndexWriter threads

 * Numerous performance improvements: faster exact PhraseQuery;
  natural segment merging favors segments with deletions; primary
  key lookup is faster; IndexWriter.addIndexes(Directory[]) uses
  file copy instead of merging; BufferedIndexInput does fewer bounds
  checks; compound file is dynamically turned off for large
  segments; fully deleted segments are dropped on commit; faster
  snowball analyzers (in contrib); ConcurrentMergeScheduler is more
  careful about setting priority of merge threads.

we had speedups to mmapdirectory too, but only for large indexes.
maybe drop the bufferedindexinput stuff and just say the Directories
are faster?

I also think we should list the performance improvements as #1 in the
list of features (it will encourage users to check out the new
release)


 * IndexWriter is now configured with a new separate builder API
  (IndexWriterConfig).

 * IndexWriter.getReader is replaced by
  IndexReader.open(IndexWriter).  In addition you can now specify
  whether deletes should be resolved when you open an NRT reader.

 * MultiSearcher is deprecated; ParallelMultiSearcher has been
  absorbed directly into IndexSearcher

I think we should re-order the statement somehow, to not emphasize the
deprecation first... IndexSearcher gets PMS's capabilities, but
without its bugs, and then secondly that PMS is deprecated.


 * CharTermAttribute replaces TermAttribute in the Analysis process

I moved this one into the 'analysis improvements' above.


 * On 64bit Windows and Solaris JVMs, MMapDirectory is now the
  default implementation (returned by FSDirectory.open).
  MMapDirectory also enables unmapping if the JVM supports it.

 * New TotalHitCountCollector just counts total number of hits

 * ReaderFinishedListener API enables external caches to evict
  entries once a segment is finished

 Solr 3.1 Release Highlights

 * Added spatial filtering, boosting and sorting capabilities

 * Added extend dismax (edismax) query parser which addresses some missing
 features in the dismax query parser along with some extensions

 * Several more components now support distributed mode: TermsComponent, 
 SpellCheckComponent

 * Added an Auto Suggest component

 * Ability to sort by functions

 * Support for adding documents using JSON format

 * Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well
 as new analysis capabilities

 * Numerous bug fixes and optimizations.

 /snip
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-trunk - Build # 1511 - Failure

2011-03-26 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1511/

1 tests failed.
FAILED:  org.apache.lucene.index.TestNRTThreads.testNRTThreads

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1215)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1147)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:519)




Build Log (for compile errors):
[...truncated 11939 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2979) Simplify configuration API of contrib Query Parser

2011-03-26 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011721#comment-13011721
 ] 

Adriano Crestani commented on LUCENE-2979:
--

Hi Phillip,

I like your idea, similiar to the one I have, but I was planning to use enum, 
however, after spent some time thinking, I can't see how I can use generic the 
way you described only using enum. So go ahead with your idea and create a 
proposal ;)

Don't forget to describe how you plan make the old and new API work together.

 Simplify configuration API of contrib Query Parser
 --

 Key: LUCENE-2979
 URL: https://issues.apache.org/jira/browse/LUCENE-2979
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9, 3.0
Reporter: Adriano Crestani
Assignee: Adriano Crestani
  Labels: api-change, gsoc, gsoc2011, lucene-gsoc-11, mentor
 Fix For: 3.2, 4.0


 The current configuration API is very complicated and inherit the concept 
 used by Attribute API to store token information in token streams. However, 
 the requirements for both (QP config and token stream) are not the same, so 
 they shouldn't be using the same thing.
 I propose to simplify QP config and make it less scary for people intending 
 to use contrib QP. The task is not difficult, it will just require a lot of 
 code change and figure out the best way to do it. That's why it's a good 
 candidate for a GSoC project.
 I would like to hear good proposals about how to make the API more friendly 
 and less scaring :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread William Bell
Maybe I am too close to this issue and not looking at global
implications like you are

SOLR-2155 seems fairly close to good to go. There are a couple open
issues that David Smiley has been asking for input on.
I would recommend we answer those questions, and commit it. Then we
can look at modules, etc.

If a rewrite was on the works, a committer should have said something
a LONG time ago. Like back in October or something. Are we talking
redesign or refactoring? I think core spatial things should remain in
Lucene.

Even though Spatial support in 3.1 is basic it is stable, and VERY
fast. We ran regression tests and the performance was 10-100x faster
than the plugin solution of Solr Spatial from Patrick. Whatever fancy
support for polygons, etc we support it needs to be even faster than
what we have with 3.1.

What I like about this patch more than anything is the support for
multiple-lat longs per document. I have several clients who need this
feature. For example, one doctor with multiple offices. It would be
nice if a committer would work with David Smiley to get this done.

1. We would like to know if the pole issue can be solved and how.
2. We would like to know the best way to support the multi lat long
(without the copy happening) and get the values from multigeodist(). I
have pushed up a good example, and I would like someone to please
comment and maybe even show me some code to do that. There has been
some discussions on this issue - my solution uses VS and it is fast.
There might be faster and more simple ways to handle the N number of
points.

On another note, it is frustrating when David and I put in some time
on this, and it sits out there with us begging for a committer to
assist, and then when Grant starts discussions, it is summarility
discarded with a new design without any input from the original
contributors? Is this how we want to do things here?

We should have Grant or Yonik work with David to get this patch done,

Then we can discuss Spatial V2 and the design of it.

Bill




On Sat, Mar 26, 2011 at 7:19 PM, Chris Male gento...@gmail.com wrote:


 On Sun, Mar 27, 2011 at 7:30 AM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:
 
  No, the question is: what justification is there for adding spatial
  support to solr-only, leaving lucene with a broken contrib module,
  versus adding it where it belongs and exposing it to solr?
 
  There need not be any linkage to lucene to improve a Solr feature.
  If you disagree, we should vote to clarify - this is too important
  (and too much of a negative for Solr).
 
 
  I don't think there is *requirement* to move the core spatial stuff to
  lucene, but I think there is huge benefit to both communities if
  things have as few dependencies as possible.  To be frank, the spatial
  support in solr is pretty hairy -- it works for some use cases, but is
  not extendable and quite basic.  Calling it 'distance' seems more
  appropriate then 'spatial'


 Having something basic that works (and has a clean enough high level
 HTTP interface) was clearly a win for Solr users.

 Of course a more fully featured spatial module would be a win for
 everyone, but that's ignoring the more generic issue at hand here:
 a patch that improves Solr's spatial
 should not be blocked on the grounds that it does not improve Lucene's
 spatial enough.

 I don't think we need to see it that way, we want to improve both Solr and
 Lucene's spatial support, not block either.  As you say, having a module is
 a win for everyone, Solr and Lucene alike, so it seems obvious that we
 should go down that path and the code in SOLR-2155 would make a great first
 addition.


 Likewise, the ridiculous notion that Queries don't belong in Solr
 needs to be put to rest.

 Issues in and around this seem to be coming up a lot these days (I'm
 thinking FunctionQuerys too).  Sounds like something that really does need
 to be openly discussed.


 -Yonik

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Chris Male | Software Developer | JTeam BV.| www.jteam.nl


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-03-26 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011730#comment-13011730
 ] 

James Dyer commented on SOLR-2382:
--

There is a multi-threaded unit test in TestDihCacheWriterAndProcessor.java.  
However, I have not used the threads param in a real-world setting.

 DIH Cache Improvements
 --

 Key: SOLR-2382
 URL: https://issues.apache.org/jira/browse/SOLR-2382
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
 Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, 
 SOLR-2382.patch


 Functionality:
  1. Provide a pluggable caching framework for DIH so that users can choose a 
 cache implementation that best suits their data and application.
  
  2. Provide a means to temporarily cache a child Entity's data without 
 needing to create a special cached implementation of the Entity Processor 
 (such as CachedSqlEntityProcessor).
  
  3. Provide a means to write the final (root entity) DIH output to a cache 
 rather than to Solr.  Then provide a way for a subsequent DIH call to use the 
 cache as an Entity input.  Also provide the ability to do delta updates on 
 such persistent caches.
  
  4. Provide the ability to partition data across multiple caches that can 
 then be fed back into DIH and indexed either to varying Solr Shards, or to 
 the same Core in parallel.
 Use Cases:
  1. We needed a flexible  scalable way to temporarily cache child-entity 
 data prior to joining to parent entities.
   - Using SqlEntityProcessor with Child Entities can cause an n+1 select 
 problem.
   - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching 
 mechanism and does not scale.
   - There is no way to cache non-SQL inputs (ex: flat files, xml, etc).
  
  2. We needed the ability to gather data from long-running entities by a 
 process that runs separate from our main indexing process.
   
  3. We wanted the ability to do a delta import of only the entities that 
 changed.
   - Lucene/Solr requires entire documents to be re-indexed, even if only a 
 few fields changed.
   - Our data comes from 50+ complex sql queries and/or flat files.
   - We do not want to incur overhead re-gathering all of this data if only 1 
 entity's data changed.
   - Persistent DIH caches solve this problem.
   
  4. We want the ability to index several documents in parallel (using 1.4.1, 
 which did not have the threads parameter).
  
  5. In the future, we may need to use Shards, creating a need to easily 
 partition our source data into Shards.
 Implementation Details:
  1. De-couple EntityProcessorBase from caching.  
   - Created a new interface, DIHCache  two implementations:  
 - SortedMapBackedCache - An in-memory cache, used as default with 
 CachedSqlEntityProcessor (now deprecated).
 - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested 
 with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.  
 I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar, 
 so to use or evaluate this patch, download bdb-je from 
 http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 
  
  2. Allow Entity Processors to take a cacheImpl parameter to cause the 
 entity data to be cached (see EntityProcessorBase  DIHCacheProperties).
  
  3. Partially De-couple SolrWriter from DocBuilder
   - Created a new interface DIHWriter,  two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

  4. Create a new Entity Processor, DIHCacheProcessor, which reads a 
 persistent Cache as DIH Entity Input.
  
  5. Support a partition parameter with both DIHCacheWriter and 
 DIHCacheProcessor to allow for easy partitioning of source entity data.
  
  6. Change the semantics of entity.destroy()
   - Previously, it was being called on each iteration of 
 DocBuilder.buildDocument().
   - Now it is does one-time cleanup tasks (like closing or deleting a 
 disk-backed cache) once the entity processor is completed.
   - The only out-of-the-box entity processor that previously implemented 
 destroy() was LineEntitiyProcessor, so this is not a very invasive change.
 General Notes:
 We are near completion in converting our search functionality from a legacy 
 search engine to Solr.  However, I found that DIH did not support caching to 
 the level of our prior product's data import utility.  In order to get our 
 data into Solr, I created these caching enhancements.  Because I believe this 
 has broad application, and because we would like this feature to be supported 
 by the Community, I have front-ported this, enhanced, to