[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-04-15 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254357#comment-13254357
 ] 

Bill Bell commented on SOLR-2155:
-

Could it be if they only have 1 entry in the store_geohash field?

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you 
 can drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on 
 this code.  The Solr 4 glue for it should come very soon but as of this 
 writing it's hosted temporarily at Spatial4j.com.  For more information on 
 using SOLR-2155 with Solr 3, see 
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is 
 closed because it won't be committed in its current form.
 {panel}
 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-04-15 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254356#comment-13254356
 ] 

Bill Bell commented on SOLR-2155:
-

I tried SOLR 2155 1.0.3 and the same results.

I am able to simplify the issue with a simple query:

{code}
http://localhost:8983/solr/citystateprovider/select?fl=score,display_name,store_lat_lon,store_geohash,city_stateq.alt=*:*d=100pt=39.740112,-104.984856defType=dismaxrows=100echoParams=allsfield=store_geohashsort=geodisth%28%29%20asc;
{code}

AND the one that works:

{code}

http://localhost:8983/solr/citystateprovider/select?fl=score,display_name,store_lat_lon,store_geohash,city_stateq.alt=*:*pt=39.740112,-104.984856defType=dismaxrows=100echoParams=allsfield=store_lat_lonsort=geodist%28%29%20asc;

{code}



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you 
 can drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on 
 this code.  The Solr 4 glue for it should come very soon but as of this 
 writing it's hosted temporarily at Spatial4j.com.  For more information on 
 using SOLR-2155 with Solr 3, see 
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is 
 closed because it won't be committed in its current form.
 {panel}
 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-04-15 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254359#comment-13254359
 ] 

Bill Bell commented on SOLR-2155:
-

Ohh. This is interesting, if I switch to sort using the new geodisth() for the 
sfield=store_lat_lon fields I get the same results as what I use 
sfield=store_geohash (the wrong ones).



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you 
 can drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on 
 this code.  The Solr 4 glue for it should come very soon but as of this 
 writing it's hosted temporarily at Spatial4j.com.  For more information on 
 using SOLR-2155 with Solr 3, see 
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is 
 closed because it won't be committed in its current form.
 {panel}
 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-04-15 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254392#comment-13254392
 ] 

Bill Bell commented on SOLR-2155:
-

The distance is calculated properly, but the sort is not working. Probably due 
to the wrong function being called.

You need to add a test case to check the sort order coming back from Solr.







 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you 
 can drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on 
 this code.  The Solr 4 glue for it should come very soon but as of this 
 writing it's hosted temporarily at Spatial4j.com.  For more information on 
 using SOLR-2155 with Solr 3, see 
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is 
 closed because it won't be committed in its current form.
 {panel}
 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-04-15 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254398#comment-13254398
 ] 

Bill Bell commented on SOLR-2155:
-

sort is definitely not working in all cases using geohash field.

I am going to compare with other sorts... Maybe we need to extend another class?


 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you 
 can drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on 
 this code.  The Solr 4 glue for it should come very soon but as of this 
 writing it's hosted temporarily at Spatial4j.com.  For more information on 
 using SOLR-2155 with Solr 3, see 
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is 
 closed because it won't be committed in its current form.
 {panel}
 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3304) Add Solr support for the new Lucene spatial module

2012-04-02 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244937#comment-13244937
 ] 

Bill Bell commented on SOLR-3304:
-

Let me know if I can help.

 Add Solr support for the new Lucene spatial module
 --

 Key: SOLR-3304
 URL: https://issues.apache.org/jira/browse/SOLR-3304
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: David Smiley
  Labels: spatial

 Get the Solr spatial module integrated with the lucene spatial module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3302) Upgrade SLF4J to 1.6.4

2012-04-02 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244938#comment-13244938
 ] 

Bill Bell commented on SOLR-3302:
-

Seems like a little or no risk upgrade... For 3.6?

 Upgrade SLF4J to 1.6.4
 --

 Key: SOLR-3302
 URL: https://issues.apache.org/jira/browse/SOLR-3302
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.5, 4.0
Reporter: Bill Bell

 Replace 1.6.1 with 1.6.4
 http://www.slf4j.org/dist/slf4j-1.6.4.tar.gz 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-04-02 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244961#comment-13244961
 ] 

Bill Bell commented on SOLR-2242:
-

Ready for 3x merge. Test with:

ant test -Dtestcase=NumFacetTermsFacetsTest


 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, 
 SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-28 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240225#comment-13240225
 ] 

Bill Bell commented on SOLR-2242:
-

I changed the sharing response to check the size and only return the shard name 
if there is a response.

{code}
lst name=facet_numTerms
lst name=localhost:8983/solr/
/lst

Changed to 

lst name=facet_numTerms/
{code}

Also, the code for field_facets was wrong. It needs to return the name of the 
field even if the size is 0 or null.

See latest patch for 3x.





 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-solr40-3.patch, 
 SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-28 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240240#comment-13240240
 ] 

Bill Bell commented on SOLR-2242:
-

Found a bug and attaching new patch.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_2.patch, 
 SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-28 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240270#comment-13240270
 ] 

Bill Bell commented on SOLR-2242:
-

All tests pass on branch_3x now. 



 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_4.patch, 
 SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-27 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239230#comment-13239230
 ] 

Bill Bell commented on SOLR-2155:
-

It seems that a max size of 10 would be too small. If we have an average of 3 
geohash fields per doc, and we have 2M rows how do we set these caches?

We will get back to you on when it occurs. We don't commit during the day. 
Maybe during a gc?

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-27 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240158#comment-13240158
 ] 

Bill Bell commented on SOLR-2242:
-

Yonik agreed.  However what is the alternative. We are talking distinct terms, 
and unless I limit the number of terms there could be a performance issue on 
using this with sharding. Since I would need to sent the terms and combine them 
and look for uniques. I am willing to do that work (not that much coding - more 
worried about CPU and network performance). The one I submitted does change the 
format by ADDING a new section. It shouldn't break other facets (usually adding 
sections to the JSON/XML output should not be a hard break). The latest version 
does not change the facet_field section so it is compatible.

I am working on getting the tests to work. Most seem trivial fixes and not more 
serious. Since we changed the format...

However, several people would like to use this. If I fix the test cases that 
are breaking can we consider a commit?



 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-solr40-3.patch, 
 SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-26 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238921#comment-13238921
 ] 

Bill Bell commented on SOLR-2155:
-

David,

We are seeing weird slow performance on your new 1.0.4 release.

INFO: [providersearch] webapp=/solr path=/select 
params={d=160.9344facet=falsewt=jsonrows=6start=0pt=42.6450,-73.facet.field=5star_45f.5star_45.facet.mincount=1qt=providersearchspecdistfq=specialties_ids:(45+)qq=city_state_lower:albany,+nyf.5star_45.facet.limit=-1}
 hits=960 status=0 QTime=8222 

Hitting that with a slightly different lat long comes back almost instantly. 
I'm not sure why sometimes they take seconds instead of milliseconds.  There is 
also this log entry a few lines before the long query:

Mar 22, 2012 11:26:29 AM solr2155.solr.search.function.GeoHashValueSource init
INFO: field 'store_geohash' in RAM: loaded min/avg/max per doc #: 
(1,1.1089503,11) #2270017

Are we missing something? Shall we go back to 1.0.3 ?

Shall I increase the following?  What does this actually do?

cache name=fieldValueCache
class=solr.FastLRUCache size=10 initialSize=1
autowarmCount=1/ 



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-21 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235365#comment-13235365
 ] 

Bill Bell commented on SOLR-2242:
-

I added sharding as discussed by Antoine.

{code}
lst name=facet_numTerms
lst name=http://localhost:8983/solr;
int name=price14/int
int name=cat15/int
/lst
lst name=http://localhost:8081/solr;
int name=price23/int
int name=cat3/int
/lst
/lst
{code}

Example call

http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8081/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=pricefacet.field=cat



 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242-solr40-2.patch, 
 SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-21 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235366#comment-13235366
 ] 

Bill Bell commented on SOLR-2242:
-

https://issues.apache.org/jira/secure/attachment/12519406/SOLR-2242-solr40-2.patch
 is the latest patch.


 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242-solr40-2.patch, 
 SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233162#comment-13233162
 ] 

Bill Bell commented on SOLR-3230:
-

Yonik,

Can you take a peak at my patch?

Would love to get some feedback. We are seeing x2 performance improvements on 
filtering using this patch.

 Performance improvement for geofilt by doing a bbox approximation and then 
 Filter
 -

 Key: SOLR-3230
 URL: https://issues.apache.org/jira/browse/SOLR-3230
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 4.0

 Attachments: SOLR-3230-3.patch, SOLR-3230.patch


 This changes {!geofilt} to use a bounding box and then does a accurate filter.
 See attachment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2393) Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233165#comment-13233165
 ] 

Bill Bell commented on SOLR-2393:
-

Can someone please review my patch to see if it makes sense... ?

I am not an expert at these Phonetic filters. I think I nailed it, and it 
appears to work well.


 Clean up Double Metaphone code (PhoneticFilterFactory and 
 DoubleMetaphoneFilter)
 

 Key: SOLR-2393
 URL: https://issues.apache.org/jira/browse/SOLR-2393
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Bill Bell
 Attachments: SOLR-2393.patch


 From Ryan McKinley to Jan Høydahl:
 PhoneticFilterFactory could check
 DoubleMetaphone.equals( encoder ) and then create the specialized
 Filter.
 I don't feel strongly, but we could have:
 EncoderFilter -- just uses 'encode()'
 DoubleMetaphoneFilter - uses special double metaphone stuff
 EncoderFilterFactory -- always uses EncoderFilter, and is not
 semantically bound to 'phonetic'
 PhoneticFilterFactory -- picks the best phonetic filter impl (encoder
 or double metaphone)
 then deprecate:
 PhoneticFilter
 DoubleMetaphoneFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3159) Upgrade to Jetty 8

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233174#comment-13233174
 ] 

Bill Bell commented on SOLR-3159:
-

Can you point to the JENKINS issue? If there is not one, we should post an 
issue on the JENKINS JIRA...

Are you saying that it might be isolated to JENKINS? I have been avoiding NIO - 
maybe for the wrong reasons. I heard that BIO ran better on Windows 2008 ?

(DOn't ask why we are running SOLR on Windows).



 Upgrade to Jetty 8
 --

 Key: SOLR-3159
 URL: https://issues.apache.org/jira/browse/SOLR-3159
 Project: Solr
  Issue Type: Task
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3159-maven.patch


 Solr is currently tested (and bundled) with a patched jetty-6 version.  
 Ideally we can release and test with a standard version.
 Jetty-6 (at codehaus) is just maintenance now.  New development and 
 improvements are now hosted at eclipse.  Assuming performance is equivalent, 
 I think we should switch to Jetty 8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233181#comment-13233181
 ] 

Bill Bell commented on SOLR-2242:
-

Cody,

I love your suggestion. I am actually ready to work on it. 

{code}
lst name=facet_numTerms
   int name=text124/int
/lst
{code}

After we get it committed we should then fix the shard issues as per SOLR-3134.

We can also create a new JIRA ticket for that. 

Everyone agreed?

I will do it on SOLR 4.0 and back port to 3.5.




 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233224#comment-13233224
 ] 

Bill Bell commented on SOLR-2242:
-

How does it work?

{code}
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catfacet.field=pricef.price.facet.numTerms=truefacet.limit=-1f.cat.facet.numTerms=truef.price.facet.limit=1
{code}

Parameters:

facet.numTerms or f.field.facet.numTerms = true (default is false) - turn on 
distinct counting of terms
facet.field - the field to count the terms

It creates a new section in the facet section... For example:

{code}
lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
lst name=cat
  int name=camera1/int
  int name=connector2/int
  int name=copier1/int
  int name=currency4/int
  int name=electronics14/int
  int name=graphics card2/int
  int name=hard drive2/int
  int name=memory3/int
  int name=monitor2/int
  int name=multifunction printer1/int
  int name=music1/int
  int name=printer1/int
  int name=scanner1/int
  int name=search2/int
  int name=software2/int
/lst
lst name=price
  int name=0.03/int
/lst
  /lst
  lst name=facet_numTerms
int name=cat15/int
int name=price14/int
  /lst
  lst name=facet_dates/
  lst name=facet_ranges/
/lst
{code}






 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, 
 SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, 
 SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233230#comment-13233230
 ] 

Bill Bell commented on SOLR-2155:
-

I already had updated http://wiki.apache.org/solr/SpatialSearchDev with the 
SOLR-2155 info. 



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-03-19 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233232#comment-13233232
 ] 

Bill Bell commented on SOLR-2242:
-

See 
https://issues.apache.org/jira/secure/attachment/12519024/SOLR-2242-solr40.patch
 for the patch.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, 
 SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, 
 SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, 
 SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3159) Upgrade to Jetty 8

2012-03-18 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232420#comment-13232420
 ] 

Bill Bell commented on SOLR-3159:
-

For sharding we probably want to add back the equivalent for MAX POST size.

{code}
!-- Increase the maximum POST size to 1 MB to be able to handle large 
shard requests --
Call class=java.lang.System name=setProperty
  Argorg.mortbay.jetty.Request.maxFormContentSize/Arg
  Arg100/Arg
/Call
{code}
To something like:
{code}
   Call class=java.lang.System name=setProperty
  Argorg.eclipse.jetty.Request.maxFormContentSize/Arg
  Arg50/Arg
/Call
{code}

See 
http://wiki.eclipse.org/Jetty/Howto/Configure_Form_Size#Changing_the_Maximum_Form_Size_for_All_Apps_on_a_Server





 Upgrade to Jetty 8
 --

 Key: SOLR-3159
 URL: https://issues.apache.org/jira/browse/SOLR-3159
 Project: Solr
  Issue Type: Task
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3159-maven.patch


 Solr is currently tested (and bundled) with a patched jetty-6 version.  
 Ideally we can release and test with a standard version.
 Jetty-6 (at codehaus) is just maintenance now.  New development and 
 improvements are now hosted at eclipse.  Assuming performance is equivalent, 
 I think we should switch to Jetty 8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-17 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232133#comment-13232133
 ] 

Bill Bell commented on SOLR-2155:
-

I did figure out the {!gh_geofilt}


The parameters are point and radius, not pt and d.

Radius also is in meters not km.

The performance of this gh geofilt is also most the same as geofilt. So not 
sure why you would need it.



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-17 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232156#comment-13232156
 ] 

Bill Bell commented on SOLR-2155:
-

D is in km and radius is in meters. So Ou would need radius=5000

Geofilt does to use point and radius as far as I can tell.

Also hg_geofilt does not use pt and d. 

I just don't want people confused.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3159) Upgrade to Jetty 8

2012-03-17 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232171#comment-13232171
 ] 

Bill Bell commented on SOLR-3159:
-

Is this back ported to 3x? There is also a BIO issue in Jetty 6

https://jira.codehaus.org/browse/JETTY-1458

The ticket for 3159 only says Solr 4.

 Upgrade to Jetty 8
 --

 Key: SOLR-3159
 URL: https://issues.apache.org/jira/browse/SOLR-3159
 Project: Solr
  Issue Type: Task
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3159-maven.patch


 Solr is currently tested (and bundled) with a patched jetty-6 version.  
 Ideally we can release and test with a standard version.
 Jetty-6 (at codehaus) is just maintenance now.  New development and 
 improvements are now hosted at eclipse.  Assuming performance is equivalent, 
 I think we should switch to Jetty 8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3243) eDismax and non-fielded range query

2012-03-17 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232173#comment-13232173
 ] 

Bill Bell commented on SOLR-3243:
-

If production sites are using edismax this seems like a very critical issue.

 eDismax and non-fielded range query
 ---

 Key: SOLR-3243
 URL: https://issues.apache.org/jira/browse/SOLR-3243
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5
Reporter: Jan Høydahl
Priority: Critical
 Fix For: 3.6, 4.0


 Reported by Bill Bell in SOLR-3085:
 If you enter a non-fielded open-ended range in the search box, like [* TO *], 
 eDismax will expand it to all fields:
 {noformat}
 +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO 
 *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0))
 {noformat}
 This does not make sense, and a side effect is that range queries for strings 
 are very expensive, open-ended even more, and you can totally crash the 
 search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) 
 a few times...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3243) eDismax and non-fielded range query

2012-03-17 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232174#comment-13232174
 ] 

Bill Bell commented on SOLR-3243:
-

If production sites are using edismax this seems like a very critical issue.

 eDismax and non-fielded range query
 ---

 Key: SOLR-3243
 URL: https://issues.apache.org/jira/browse/SOLR-3243
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5
Reporter: Jan Høydahl
Priority: Critical
 Fix For: 3.6, 4.0


 Reported by Bill Bell in SOLR-3085:
 If you enter a non-fielded open-ended range in the search box, like [* TO *], 
 eDismax will expand it to all fields:
 {noformat}
 +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO 
 *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0))
 {noformat}
 This does not make sense, and a side effect is that range queries for strings 
 are very expensive, open-ended even more, and you can totally crash the 
 search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) 
 a few times...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-15 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229945#comment-13229945
 ] 

Bill Bell commented on SOLR-2155:
-

David,

What is an example URL call for multiValued field? Does geofilt work?

/select?q=*:*fq={!geofilt}sort=geodist() ascsfield=store_hashd=10

Or do we need to use gh_geofilt? like this?

/select?q=*:*fq={!gh_geofilt}sort=geodist() ascsfield=store_hashd=10


 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter

2012-03-14 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229046#comment-13229046
 ] 

Bill Bell commented on SOLR-3230:
-

Yonik...

I am not that familiar with this code. I do notice 2 methods in 
LatLonType.java. Is this the right place?

public Query getFieldQuery(QParser parser, SchemaField field, String 
externalVal) {
 
public Query getRangeQuery(QParser parser, SchemaField field, String part1, 
String part2, boolean minInclusive, boolean maxInclusive) {
  
I did not see how these 2 functions are called. In class SpatialDistanceQuery I 
did not see where you said we are using range or fc... ? Maybe example code ?



 Performance improvement for geofilt by doing a bbox approximation and then 
 Filter
 -

 Key: SOLR-3230
 URL: https://issues.apache.org/jira/browse/SOLR-3230
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 4.0

 Attachments: SOLR-3230.patch


 This changes {!geofilt} to use a bounding box and then does a accurate filter.
 See attachment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3085) Fix the dismax/edismax stopwords mm issue

2012-03-13 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228990#comment-13228990
 ] 

Bill Bell commented on SOLR-3085:
-

We also found another loophole. If we send [* TO *] to edismax we also can 
bring down the server.

Some chars are not being escaped before being sent to SOLR.  Eg I can send 
queries like this to SOLR by searching on ([* TO *] OR [* TO *] OR [* TO *]) in 
the search box - it took 72 seconds to return:
 
webapp=/solr path=/select 
params={d=160.9344start=0q=([*+TO+*]+OR+[*+TO+*]+OR+[*+TO+*])pt=40.7146,-74.0071qt=providersearchdistwt=jsonqq=city_state_lower:new+york,+nyrows=20}
 hits=276442 status=0 QTime=72458

 Fix the dismax/edismax stopwords mm issue
 -

 Key: SOLR-3085
 URL: https://issues.apache.org/jira/browse/SOLR-3085
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Jan Høydahl
  Labels: MinimumShouldMatch, dismax, stopwords
 Fix For: 3.6, 4.0


 As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here 
 http://search-lucene.com/m/Yne042qEyCq1 and here 
 http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if 
 not all fields used in QF have exactly same stopword lists.
 Typical solution is to not use stopwords or harmonize stopword lists across 
 all fields in your QF, or relax the MM to a lower percentag. Sometimes these 
 are not acceptable workarounds, and we should find a better solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter

2012-03-11 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227261#comment-13227261
 ] 

Bill Bell commented on SOLR-3230:
-

Yeah that actually works too.

There is no way to cache and force the order on 1st request? 

Parameter? style=range or fieldcache? Default to fieldcache as it is now?

I'll try to do it I guess. Frange is cached well but has too much overhead on 
initial request for us. We need it to be fast on the initial request. BBox does 
that for us.



 Performance improvement for geofilt by doing a bbox approximation and then 
 Filter
 -

 Key: SOLR-3230
 URL: https://issues.apache.org/jira/browse/SOLR-3230
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 4.0

 Attachments: SOLR-3230.patch


 This changes {!geofilt} to use a bounding box and then does a accurate filter.
 See attachment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter

2012-03-10 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227008#comment-13227008
 ] 

Bill Bell commented on SOLR-3230:
-

I am pretty new to Filters and all of this. But it appeared that the 
SpatialDistanceQuery could just be added to the bbox logic to sequence it. But 
I could use a code review.

{code}
if (options.bbox) {
  spatial.bboxQuery = result;
  return spatial;
} else {
result.add(spatial, BooleanClause.Occur.MUST);
return result;
}
  }
{code}

 Performance improvement for geofilt by doing a bbox approximation and then 
 Filter
 -

 Key: SOLR-3230
 URL: https://issues.apache.org/jira/browse/SOLR-3230
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 4.0

 Attachments: SOLR-3230.patch


 This changes {!geofilt} to use a bounding box and then does a accurate filter.
 See attachment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter

2012-03-10 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227009#comment-13227009
 ] 

Bill Bell commented on SOLR-3230:
-

I did not add a new filter, since it seems a no brainer to always do the bbox 
approximation first.

 Performance improvement for geofilt by doing a bbox approximation and then 
 Filter
 -

 Key: SOLR-3230
 URL: https://issues.apache.org/jira/browse/SOLR-3230
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 4.0

 Attachments: SOLR-3230.patch


 This changes {!geofilt} to use a bounding box and then does a accurate filter.
 See attachment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter

2012-03-10 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227011#comment-13227011
 ] 

Bill Bell commented on SOLR-3230:
-

Ohh. Using exampledate. Score is the distance haversine.

http://localhost:8983/solr/select?q={!func}geodist()fl=score,storesort=score 
ascsfield=stored=348fq={!geofilt}pt=45.19614,-93.90341
-- returns 7

http://localhost:8983/solr/select?q={!func}geodist()fl=score,storesort=score 
ascsfield=stored=347fq={!geofilt}pt=45.19614,-93.90341
-- returns 6

http://localhost:8983/solr/select?q={!func}geodist()fl=score,storesort=score 
ascsfield=stored=347fq={!bbox}pt=45.19614,-93.90341
-- returns 7 (since it is not accurate)

 Performance improvement for geofilt by doing a bbox approximation and then 
 Filter
 -

 Key: SOLR-3230
 URL: https://issues.apache.org/jira/browse/SOLR-3230
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 4.0

 Attachments: SOLR-3230.patch


 This changes {!geofilt} to use a bounding box and then does a accurate filter.
 See attachment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module

2012-03-07 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225051#comment-13225051
 ] 

Bill Bell commented on LUCENE-3795:
---

Ryan... 

Why not just include spatial4j.jar into Lucene/Solr and later separate it. It 
looks like SIS is not even off the ground yet.

In my opinion an API is not really an API until 3 or more projects use it 
anyways.

Why not make JTS pluggable as a separate module into Lucene? Then people can 
download JTS and add it into the Spatial solution by modifying a config file 
like solrconfig.xml ?

I would love to get this committed and done done.

Bill


 Replace spatial contrib module with LSP's spatial-lucene module
 ---

 Key: LUCENE-3795
 URL: https://issues.apache.org/jira/browse/LUCENE-3795
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0


 I propose that Lucene's spatial contrib module be replaced with the 
 spatial-lucene module within Lucene Spatial Playground (LSP).  LSP has been 
 in development for approximately 1 year by David Smiley, Ryan McKinley, and 
 Chris Male and we feel it is ready.  LSP is here: 
 http://code.google.com/p/lucene-spatial-playground/  and the spatial-lucene 
 module is intuitively in svn/trunk/spatial-lucene/.
 I'll add more comments to prevent the issue description from being too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2393) Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)

2012-02-27 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217909#comment-13217909
 ] 

Bill Bell commented on SOLR-2393:
-

Done.

 Clean up Double Metaphone code (PhoneticFilterFactory and 
 DoubleMetaphoneFilter)
 

 Key: SOLR-2393
 URL: https://issues.apache.org/jira/browse/SOLR-2393
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Jan Høydahl
 Attachments: SOLR-2393.patch


 From Ryan McKinley to Jan Høydahl:
 PhoneticFilterFactory could check
 DoubleMetaphone.equals( encoder ) and then create the specialized
 Filter.
 I don't feel strongly, but we could have:
 EncoderFilter -- just uses 'encode()'
 DoubleMetaphoneFilter - uses special double metaphone stuff
 EncoderFilterFactory -- always uses EncoderFilter, and is not
 semantically bound to 'phonetic'
 PhoneticFilterFactory -- picks the best phonetic filter impl (encoder
 or double metaphone)
 then deprecate:
 PhoneticFilter
 DoubleMetaphoneFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3162) Continue work on new admin UI

2012-02-26 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216787#comment-13216787
 ] 

Bill Bell commented on SOLR-3162:
-

We need to add edismax in the query section. dismax is there... Also, all 
the edismax parameters. ps,pf,qf, etc.

 Continue work on new admin UI
 -

 Key: SOLR-3162
 URL: https://issues.apache.org/jira/browse/SOLR-3162
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Erick Erickson

 There have been more improvements to how the new UI works, but the current 
 open bugs are getting hard to keep straight. This is the new catch-all JIRA 
 for continued improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3162) Continue work on new admin UI

2012-02-26 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216789#comment-13216789
 ] 

Bill Bell commented on SOLR-3162:
-

Should we put on the front-page HTTP caching is OFF  ?

 Continue work on new admin UI
 -

 Key: SOLR-3162
 URL: https://issues.apache.org/jira/browse/SOLR-3162
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Erick Erickson

 There have been more improvements to how the new UI works, but the current 
 open bugs are getting hard to keep straight. This is the new catch-all JIRA 
 for continued improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3360) Move FieldCache to IndexReader

2012-02-26 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216794#comment-13216794
 ] 

Bill Bell commented on LUCENE-3360:
---

+1 for FieldCache being an impl of DocValues.

 Move FieldCache to IndexReader
 --

 Key: LUCENE-3360
 URL: https://issues.apache.org/jira/browse/LUCENE-3360
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3360-3x.patch, LUCENE-3360-3x.patch, 
 LUCENE-3360-3x.patch, LUCENE-3360.patch, LUCENE-3360.patch, 
 LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch, 
 LUCENE-3360.patch


 Move the static FieldCache.DEFAULT field instance to atomic IndexReaders, so 
 that FieldCache insanity caused by the WeakHashMap no longer occurs.
 * Add a new method to IndexReader that by default throws an UOE:
 {code}public FieldCache getFieldCache(){code}
 * The SegmentReader implements this method and returns its own internal 
 FieldCache implementation. This implementation just uses a 
 HashMapEntryT,Object to store entries.
 * The SlowMultiReaderWrapper implements this method as well and basically 
 behaves the same as the current FieldCacheImpl.
 This issue won't solve the insanity that comes from inconsistent usage of a 
 single field (for example retrieve both int[] and DocTermIndex for the same 
 field). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2393) Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)

2012-02-26 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216796#comment-13216796
 ] 

Bill Bell commented on SOLR-2393:
-

Jan...

So we would switch from -

{code}
filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone 
inject=false/ 
or
filter class=solr.DoubleMetaphoneFilterFactory inject=false/ 
{code}

to the following...

So we would have these 2 cases? Is that how you understand it?
{code}

fieldtype name=phonetic stored=false indexed=true class=solr.TextField 
 
  analyzer 
tokenizer class=solr.StandardTokenizerFactory/ 
filter class=solr.PhoneticFilterFactory encoder=Phonetic 
inject=false/ 
  /analyzer 
/fieldtype 


fieldtype name=phonetic stored=false indexed=true class=solr.TextField 
 
  analyzer 
tokenizer class=solr.StandardTokenizerFactory/ 
filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone 
inject=false/ 
  /analyzer 
/fieldtype 
{code}






 Clean up Double Metaphone code (PhoneticFilterFactory and 
 DoubleMetaphoneFilter)
 

 Key: SOLR-2393
 URL: https://issues.apache.org/jira/browse/SOLR-2393
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Bill Bell

 From Ryan McKinley to Jan Høydahl:
 PhoneticFilterFactory could check
 DoubleMetaphone.equals( encoder ) and then create the specialized
 Filter.
 I don't feel strongly, but we could have:
 EncoderFilter -- just uses 'encode()'
 DoubleMetaphoneFilter - uses special double metaphone stuff
 EncoderFilterFactory -- always uses EncoderFilter, and is not
 semantically bound to 'phonetic'
 PhoneticFilterFactory -- picks the best phonetic filter impl (encoder
 or double metaphone)
 then deprecate:
 PhoneticFilter
 DoubleMetaphoneFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2566) + - operators allow any amount of whitespace

2012-02-25 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216548#comment-13216548
 ] 

Bill Bell commented on LUCENE-2566:
---

+1 to backport to 3x.

 + - operators allow any amount of whitespace
 

 Key: LUCENE-2566
 URL: https://issues.apache.org/jira/browse/LUCENE-2566
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/queryparser
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2566.patch


 As an example, (foo - bar) is treated like (foo -bar).
 It seems like for +- to be treated as unary operators, they should be 
 immediately followed by the operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module

2012-02-17 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210814#comment-13210814
 ] 

Bill Bell commented on LUCENE-3795:
---

+1 getting polygon support ootb is huge as is geohashing and multivalued lat 
longs. Next step Solr...

 Replace spatial contrib module with LSP's spatial-lucene module
 ---

 Key: LUCENE-3795
 URL: https://issues.apache.org/jira/browse/LUCENE-3795
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0


 I propose that Lucene's spatial contrib module be replaced with the 
 spatial-lucene module within Lucene Spatial Playground (LSP).  LSP has been 
 in development for approximately 1 year by David Smiley, Ryan McKinley, and 
 Chris Male and we feel it is ready.  LSP is here: 
 http://code.google.com/p/lucene-spatial-playground/  and the spatial-lucene 
 module is intuitively in svn/trunk/spatial-lucene/.
 I'll add more comments to prevent the issue description from being too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3058) Change default to XSL 2.0 for XSLT TR

2012-01-24 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192773#comment-13192773
 ] 

Bill Bell commented on SOLR-3058:
-

Thoughts?

 Change default to XSL 2.0 for XSLT TR 
 --

 Key: SOLR-3058
 URL: https://issues.apache.org/jira/browse/SOLR-3058
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 XSLT 2.0 has been out for a long time. We should change the default SOLR 
 transformer to an XSLT 2.0/XPath 2.0 library.
 http://www.w3.org/TR/xslt20/
 Is there an Apache project that supports this instead of using Saxon ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

2012-01-22 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190872#comment-13190872
 ] 

Bill Bell commented on LUCENE-1889:
---

It looks like this will automatically be used in SOLR ? OR do we need to add 
support for this ?

 FastVectorHighlighter: support for additional queries
 -

 Key: LUCENE-1889
 URL: https://issues.apache.org/jira/browse/LUCENE-1889
 Project: Lucene - Java
  Issue Type: Wish
  Components: modules/highlighter
Reporter: Robert Muir
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-1889-solr.patch, LUCENE-1889.patch, 
 LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889_reader.patch


 I am using fastvectorhighlighter for some strange languages and it is working 
 well! 
 One thing i noticed immediately is that many query types are not highlighted 
 (multitermquery, multiphrasequery, etc)
 Here is one thing Michael M posted in the original ticket:
 {quote}
 I think a nice [eventual] model would be if we could simply re-run the
 scorer on the single document (using InstantiatedIndex maybe, or
 simply some sort of wrapper on the term vectors which are already a
 mini-inverted-index for a single doc), but extend the scorer API to
 tell us the exact term occurrences that participated in a match (which
 I don't think is exposed today).
 {quote}
 Due to strange requirements I am using something similar to this (but 
 specialized to our case).
 I am doing strange things like forcing multitermqueries to rewrite into 
 boolean queries so they will be highlighted,
 and flattening multiphrasequeries into boolean or'ed phrasequeries.
 I do not think these things would be 'fast', but i had a few ideas that might 
 help:
 * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
 by calling getQuery() right?
 * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2012-01-21 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190550#comment-13190550
 ] 

Bill Bell commented on LUCENE-3426:
---

Is this automatic in SOLR? Or do we need to add a feature to support his in 
SOLR?

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, 
 PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2012-01-14 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186295#comment-13186295
 ] 

Bill Bell commented on SOLR-2155:
-

David,

We really need to get this into the trunk. What is left to do? Would really 
love to use this, but until it is moved into the trunk most clients won't. 

Maybe we can scale it back a bit, add the feature and iterate?

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2804) Logging error causes entire DIH process to fail

2011-12-29 Thread Bill Bell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177517#comment-13177517
 ] 

Bill Bell commented on SOLR-2804:
-

We are having this issue as well

NFO: Wrote last indexed time to 
C:\solr\jetty\example\solr\provider\conf\dataimport.properties
Dec 29, 2011 3:42:58 PM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList 
cannot be cast to java.lang.String
    at 
org.apache.solr.common.util.NamedList.getName(NamedList.java:127)
    at 
org.apache.solr.common.util.NamedList.toString(NamedList.java:253)
    at java.lang.String.valueOf(Unknown Source)
    at java.lang.StringBuilder.append(Unknown Source)
    at 
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188)
    at 
org.apache.solr.handler.dataimport.SolrWriter.finish(SolrWriter.java:133)
    at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:213)
    at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
    at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
    at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
 
All environments are using the same db and config files are the same – so it 
appears to be problem with SOLR.  This happens right at the end of indexing.  
The fix on Test the other day was to just re-run the index. 

 Logging error causes entire DIH process to fail
 ---

 Key: SOLR-2804
 URL: https://issues.apache.org/jira/browse/SOLR-2804
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
 Environment: java version 1.6.0_26
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
 Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
 Model Name:   MacBook Pro
   Model Identifier:   MacBookPro8,2
   Processor Name: Intel Core i7
   Processor Speed:2.2 GHz
   Number of Processors:   1
   Total Number of Cores:  4
   L2 Cache (per Core):256 KB
   L3 Cache:   6 MB
   Memory: 4 GB
 System Software Overview:
   System Version: Mac OS X 10.6.8 (10K549)
   Kernel Version: Darwin 10.8.0
Reporter: Pulkit Singhal
  Labels: dih
   Original Estimate: 48h
  Remaining Estimate: 48h

 SEVERE: Full Import failed:java.lang.ClassCastException:
 java.util.ArrayList cannot be cast to java.lang.String
at org.apache.solr.common.util.NamedList.getName(NamedList.java:127)
at org.apache.solr.common.util.NamedList.toString(NamedList.java:263)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
 org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188)
at 
 org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:57)
at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:265)
at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:372)
at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:440)
at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:421)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org