[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254357#comment-13254357 ] Bill Bell commented on SOLR-2155: - Could it be if they only have 1 entry in the store_geohash field? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can drop in to Solr 3.x. Lucene 4's new spatial module is largely based on this code. The Solr 4 glue for it should come very soon but as of this writing it's hosted temporarily at Spatial4j.com. For more information on using SOLR-2155 with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155 This JIRA issue is closed because it won't be committed in its current form. {panel} There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254356#comment-13254356 ] Bill Bell commented on SOLR-2155: - I tried SOLR 2155 1.0.3 and the same results. I am able to simplify the issue with a simple query: {code} http://localhost:8983/solr/citystateprovider/select?fl=score,display_name,store_lat_lon,store_geohash,city_stateq.alt=*:*d=100pt=39.740112,-104.984856defType=dismaxrows=100echoParams=allsfield=store_geohashsort=geodisth%28%29%20asc; {code} AND the one that works: {code} http://localhost:8983/solr/citystateprovider/select?fl=score,display_name,store_lat_lon,store_geohash,city_stateq.alt=*:*pt=39.740112,-104.984856defType=dismaxrows=100echoParams=allsfield=store_lat_lonsort=geodist%28%29%20asc; {code} Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can drop in to Solr 3.x. Lucene 4's new spatial module is largely based on this code. The Solr 4 glue for it should come very soon but as of this writing it's hosted temporarily at Spatial4j.com. For more information on using SOLR-2155 with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155 This JIRA issue is closed because it won't be committed in its current form. {panel} There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254359#comment-13254359 ] Bill Bell commented on SOLR-2155: - Ohh. This is interesting, if I switch to sort using the new geodisth() for the sfield=store_lat_lon fields I get the same results as what I use sfield=store_geohash (the wrong ones). Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can drop in to Solr 3.x. Lucene 4's new spatial module is largely based on this code. The Solr 4 glue for it should come very soon but as of this writing it's hosted temporarily at Spatial4j.com. For more information on using SOLR-2155 with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155 This JIRA issue is closed because it won't be committed in its current form. {panel} There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254392#comment-13254392 ] Bill Bell commented on SOLR-2155: - The distance is calculated properly, but the sort is not working. Probably due to the wrong function being called. You need to add a test case to check the sort order coming back from Solr. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can drop in to Solr 3.x. Lucene 4's new spatial module is largely based on this code. The Solr 4 glue for it should come very soon but as of this writing it's hosted temporarily at Spatial4j.com. For more information on using SOLR-2155 with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155 This JIRA issue is closed because it won't be committed in its current form. {panel} There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254398#comment-13254398 ] Bill Bell commented on SOLR-2155: - sort is definitely not working in all cases using geohash field. I am going to compare with other sorts... Maybe we need to extend another class? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can drop in to Solr 3.x. Lucene 4's new spatial module is largely based on this code. The Solr 4 glue for it should come very soon but as of this writing it's hosted temporarily at Spatial4j.com. For more information on using SOLR-2155 with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155 This JIRA issue is closed because it won't be committed in its current form. {panel} There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3304) Add Solr support for the new Lucene spatial module
[ https://issues.apache.org/jira/browse/SOLR-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244937#comment-13244937 ] Bill Bell commented on SOLR-3304: - Let me know if I can help. Add Solr support for the new Lucene spatial module -- Key: SOLR-3304 URL: https://issues.apache.org/jira/browse/SOLR-3304 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Bill Bell Assignee: David Smiley Labels: spatial Get the Solr spatial module integrated with the lucene spatial module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3302) Upgrade SLF4J to 1.6.4
[ https://issues.apache.org/jira/browse/SOLR-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244938#comment-13244938 ] Bill Bell commented on SOLR-3302: - Seems like a little or no risk upgrade... For 3.6? Upgrade SLF4J to 1.6.4 -- Key: SOLR-3302 URL: https://issues.apache.org/jira/browse/SOLR-3302 Project: Solr Issue Type: Improvement Affects Versions: 3.5, 4.0 Reporter: Bill Bell Replace 1.6.1 with 1.6.4 http://www.slf4j.org/dist/slf4j-1.6.4.tar.gz -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244961#comment-13244961 ] Bill Bell commented on SOLR-2242: - Ready for 3x merge. Test with: ant test -Dtestcase=NumFacetTermsFacetsTest Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240225#comment-13240225 ] Bill Bell commented on SOLR-2242: - I changed the sharing response to check the size and only return the shard name if there is a response. {code} lst name=facet_numTerms lst name=localhost:8983/solr/ /lst Changed to lst name=facet_numTerms/ {code} Also, the code for field_facets was wrong. It needs to return the name of the field even if the size is 0 or null. See latest patch for 3x. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-3x.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240240#comment-13240240 ] Bill Bell commented on SOLR-2242: - Found a bug and attaching new patch. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_2.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240270#comment-13240270 ] Bill Bell commented on SOLR-2242: - All tests pass on branch_3x now. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_4.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239230#comment-13239230 ] Bill Bell commented on SOLR-2155: - It seems that a max size of 10 would be too small. If we have an average of 3 geohash fields per doc, and we have 2M rows how do we set these caches? We will get back to you on when it occurs. We don't commit during the day. Maybe during a gc? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240158#comment-13240158 ] Bill Bell commented on SOLR-2242: - Yonik agreed. However what is the alternative. We are talking distinct terms, and unless I limit the number of terms there could be a performance issue on using this with sharding. Since I would need to sent the terms and combine them and look for uniques. I am willing to do that work (not that much coding - more worried about CPU and network performance). The one I submitted does change the format by ADDING a new section. It shouldn't break other facets (usually adding sections to the JSON/XML output should not be a hard break). The latest version does not change the facet_field section so it is compatible. I am working on getting the tests to work. Most seem trivial fixes and not more serious. Since we changed the format... However, several people would like to use this. If I fix the test cases that are breaking can we consider a commit? Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-3x.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238921#comment-13238921 ] Bill Bell commented on SOLR-2155: - David, We are seeing weird slow performance on your new 1.0.4 release. INFO: [providersearch] webapp=/solr path=/select params={d=160.9344facet=falsewt=jsonrows=6start=0pt=42.6450,-73.facet.field=5star_45f.5star_45.facet.mincount=1qt=providersearchspecdistfq=specialties_ids:(45+)qq=city_state_lower:albany,+nyf.5star_45.facet.limit=-1} hits=960 status=0 QTime=8222 Hitting that with a slightly different lat long comes back almost instantly. I'm not sure why sometimes they take seconds instead of milliseconds. There is also this log entry a few lines before the long query: Mar 22, 2012 11:26:29 AM solr2155.solr.search.function.GeoHashValueSource init INFO: field 'store_geohash' in RAM: loaded min/avg/max per doc #: (1,1.1089503,11) #2270017 Are we missing something? Shall we go back to 1.0.3 ? Shall I increase the following? What does this actually do? cache name=fieldValueCache class=solr.FastLRUCache size=10 initialSize=1 autowarmCount=1/ Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235365#comment-13235365 ] Bill Bell commented on SOLR-2242: - I added sharding as discussed by Antoine. {code} lst name=facet_numTerms lst name=http://localhost:8983/solr; int name=price14/int int name=cat15/int /lst lst name=http://localhost:8081/solr; int name=price23/int int name=cat3/int /lst /lst {code} Example call http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8081/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=pricefacet.field=cat Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242-solr40-2.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235366#comment-13235366 ] Bill Bell commented on SOLR-2242: - https://issues.apache.org/jira/secure/attachment/12519406/SOLR-2242-solr40-2.patch is the latest patch. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242-solr40-2.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233162#comment-13233162 ] Bill Bell commented on SOLR-3230: - Yonik, Can you take a peak at my patch? Would love to get some feedback. We are seeing x2 performance improvements on filtering using this patch. Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230-3.patch, SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2393) Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)
[ https://issues.apache.org/jira/browse/SOLR-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233165#comment-13233165 ] Bill Bell commented on SOLR-2393: - Can someone please review my patch to see if it makes sense... ? I am not an expert at these Phonetic filters. I think I nailed it, and it appears to work well. Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter) Key: SOLR-2393 URL: https://issues.apache.org/jira/browse/SOLR-2393 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Bill Bell Attachments: SOLR-2393.patch From Ryan McKinley to Jan Høydahl: PhoneticFilterFactory could check DoubleMetaphone.equals( encoder ) and then create the specialized Filter. I don't feel strongly, but we could have: EncoderFilter -- just uses 'encode()' DoubleMetaphoneFilter - uses special double metaphone stuff EncoderFilterFactory -- always uses EncoderFilter, and is not semantically bound to 'phonetic' PhoneticFilterFactory -- picks the best phonetic filter impl (encoder or double metaphone) then deprecate: PhoneticFilter DoubleMetaphoneFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3159) Upgrade to Jetty 8
[ https://issues.apache.org/jira/browse/SOLR-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233174#comment-13233174 ] Bill Bell commented on SOLR-3159: - Can you point to the JENKINS issue? If there is not one, we should post an issue on the JENKINS JIRA... Are you saying that it might be isolated to JENKINS? I have been avoiding NIO - maybe for the wrong reasons. I heard that BIO ran better on Windows 2008 ? (DOn't ask why we are running SOLR on Windows). Upgrade to Jetty 8 -- Key: SOLR-3159 URL: https://issues.apache.org/jira/browse/SOLR-3159 Project: Solr Issue Type: Task Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-3159-maven.patch Solr is currently tested (and bundled) with a patched jetty-6 version. Ideally we can release and test with a standard version. Jetty-6 (at codehaus) is just maintenance now. New development and improvements are now hosted at eclipse. Assuming performance is equivalent, I think we should switch to Jetty 8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233181#comment-13233181 ] Bill Bell commented on SOLR-2242: - Cody, I love your suggestion. I am actually ready to work on it. {code} lst name=facet_numTerms int name=text124/int /lst {code} After we get it committed we should then fix the shard issues as per SOLR-3134. We can also create a new JIRA ticket for that. Everyone agreed? I will do it on SOLR 4.0 and back port to 3.5. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Erick Erickson Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233224#comment-13233224 ] Bill Bell commented on SOLR-2242: - How does it work? {code} http://localhost:8983/solr/select?q=*:*facet=truefacet.field=catfacet.field=pricef.price.facet.numTerms=truefacet.limit=-1f.cat.facet.numTerms=truef.price.facet.limit=1 {code} Parameters: facet.numTerms or f.field.facet.numTerms = true (default is false) - turn on distinct counting of terms facet.field - the field to count the terms It creates a new section in the facet section... For example: {code} lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=cat int name=camera1/int int name=connector2/int int name=copier1/int int name=currency4/int int name=electronics14/int int name=graphics card2/int int name=hard drive2/int int name=memory3/int int name=monitor2/int int name=multifunction printer1/int int name=music1/int int name=printer1/int int name=scanner1/int int name=search2/int int name=software2/int /lst lst name=price int name=0.03/int /lst /lst lst name=facet_numTerms int name=cat15/int int name=price14/int /lst lst name=facet_dates/ lst name=facet_ranges/ /lst {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Erick Erickson Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233230#comment-13233230 ] Bill Bell commented on SOLR-2155: - I already had updated http://wiki.apache.org/solr/SpatialSearchDev with the SOLR-2155 info. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233232#comment-13233232 ] Bill Bell commented on SOLR-2242: - See https://issues.apache.org/jira/secure/attachment/12519024/SOLR-2242-solr40.patch for the patch. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Erick Erickson Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3159) Upgrade to Jetty 8
[ https://issues.apache.org/jira/browse/SOLR-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232420#comment-13232420 ] Bill Bell commented on SOLR-3159: - For sharding we probably want to add back the equivalent for MAX POST size. {code} !-- Increase the maximum POST size to 1 MB to be able to handle large shard requests -- Call class=java.lang.System name=setProperty Argorg.mortbay.jetty.Request.maxFormContentSize/Arg Arg100/Arg /Call {code} To something like: {code} Call class=java.lang.System name=setProperty Argorg.eclipse.jetty.Request.maxFormContentSize/Arg Arg50/Arg /Call {code} See http://wiki.eclipse.org/Jetty/Howto/Configure_Form_Size#Changing_the_Maximum_Form_Size_for_All_Apps_on_a_Server Upgrade to Jetty 8 -- Key: SOLR-3159 URL: https://issues.apache.org/jira/browse/SOLR-3159 Project: Solr Issue Type: Task Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-3159-maven.patch Solr is currently tested (and bundled) with a patched jetty-6 version. Ideally we can release and test with a standard version. Jetty-6 (at codehaus) is just maintenance now. New development and improvements are now hosted at eclipse. Assuming performance is equivalent, I think we should switch to Jetty 8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232133#comment-13232133 ] Bill Bell commented on SOLR-2155: - I did figure out the {!gh_geofilt} The parameters are point and radius, not pt and d. Radius also is in meters not km. The performance of this gh geofilt is also most the same as geofilt. So not sure why you would need it. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232156#comment-13232156 ] Bill Bell commented on SOLR-2155: - D is in km and radius is in meters. So Ou would need radius=5000 Geofilt does to use point and radius as far as I can tell. Also hg_geofilt does not use pt and d. I just don't want people confused. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3159) Upgrade to Jetty 8
[ https://issues.apache.org/jira/browse/SOLR-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232171#comment-13232171 ] Bill Bell commented on SOLR-3159: - Is this back ported to 3x? There is also a BIO issue in Jetty 6 https://jira.codehaus.org/browse/JETTY-1458 The ticket for 3159 only says Solr 4. Upgrade to Jetty 8 -- Key: SOLR-3159 URL: https://issues.apache.org/jira/browse/SOLR-3159 Project: Solr Issue Type: Task Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-3159-maven.patch Solr is currently tested (and bundled) with a patched jetty-6 version. Ideally we can release and test with a standard version. Jetty-6 (at codehaus) is just maintenance now. New development and improvements are now hosted at eclipse. Assuming performance is equivalent, I think we should switch to Jetty 8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3243) eDismax and non-fielded range query
[ https://issues.apache.org/jira/browse/SOLR-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232173#comment-13232173 ] Bill Bell commented on SOLR-3243: - If production sites are using edismax this seems like a very critical issue. eDismax and non-fielded range query --- Key: SOLR-3243 URL: https://issues.apache.org/jira/browse/SOLR-3243 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5 Reporter: Jan Høydahl Priority: Critical Fix For: 3.6, 4.0 Reported by Bill Bell in SOLR-3085: If you enter a non-fielded open-ended range in the search box, like [* TO *], eDismax will expand it to all fields: {noformat} +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0)) {noformat} This does not make sense, and a side effect is that range queries for strings are very expensive, open-ended even more, and you can totally crash the search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) a few times... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3243) eDismax and non-fielded range query
[ https://issues.apache.org/jira/browse/SOLR-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232174#comment-13232174 ] Bill Bell commented on SOLR-3243: - If production sites are using edismax this seems like a very critical issue. eDismax and non-fielded range query --- Key: SOLR-3243 URL: https://issues.apache.org/jira/browse/SOLR-3243 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.1, 3.2, 3.3, 3.4, 3.5 Reporter: Jan Høydahl Priority: Critical Fix For: 3.6, 4.0 Reported by Bill Bell in SOLR-3085: If you enter a non-fielded open-ended range in the search box, like [* TO *], eDismax will expand it to all fields: {noformat} +DisjunctionMaxQuery((content:[* TO *]^2.0 | id:[* TO *]^50.0 | author:[* TO *]^15.0 | meta:[* TO *]^10.0 | name:[* TO *]^20.0)) {noformat} This does not make sense, and a side effect is that range queries for strings are very expensive, open-ended even more, and you can totally crash the search server by hammering something like ([* TO *] OR [* TO *] OR [* TO *]) a few times... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229945#comment-13229945 ] Bill Bell commented on SOLR-2155: - David, What is an example URL call for multiValued field? Does geofilt work? /select?q=*:*fq={!geofilt}sort=geodist() ascsfield=store_hashd=10 Or do we need to use gh_geofilt? like this? /select?q=*:*fq={!gh_geofilt}sort=geodist() ascsfield=store_hashd=10 Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229046#comment-13229046 ] Bill Bell commented on SOLR-3230: - Yonik... I am not that familiar with this code. I do notice 2 methods in LatLonType.java. Is this the right place? public Query getFieldQuery(QParser parser, SchemaField field, String externalVal) { public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, boolean minInclusive, boolean maxInclusive) { I did not see how these 2 functions are called. In class SpatialDistanceQuery I did not see where you said we are using range or fc... ? Maybe example code ? Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3085) Fix the dismax/edismax stopwords mm issue
[ https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228990#comment-13228990 ] Bill Bell commented on SOLR-3085: - We also found another loophole. If we send [* TO *] to edismax we also can bring down the server. Some chars are not being escaped before being sent to SOLR. Eg I can send queries like this to SOLR by searching on ([* TO *] OR [* TO *] OR [* TO *]) in the search box - it took 72 seconds to return: webapp=/solr path=/select params={d=160.9344start=0q=([*+TO+*]+OR+[*+TO+*]+OR+[*+TO+*])pt=40.7146,-74.0071qt=providersearchdistwt=jsonqq=city_state_lower:new+york,+nyrows=20} hits=276442 status=0 QTime=72458 Fix the dismax/edismax stopwords mm issue - Key: SOLR-3085 URL: https://issues.apache.org/jira/browse/SOLR-3085 Project: Solr Issue Type: Bug Components: search Reporter: Jan Høydahl Labels: MinimumShouldMatch, dismax, stopwords Fix For: 3.6, 4.0 As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here http://search-lucene.com/m/Yne042qEyCq1 and here http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if not all fields used in QF have exactly same stopword lists. Typical solution is to not use stopwords or harmonize stopword lists across all fields in your QF, or relax the MM to a lower percentag. Sometimes these are not acceptable workarounds, and we should find a better solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227261#comment-13227261 ] Bill Bell commented on SOLR-3230: - Yeah that actually works too. There is no way to cache and force the order on 1st request? Parameter? style=range or fieldcache? Default to fieldcache as it is now? I'll try to do it I guess. Frange is cached well but has too much overhead on initial request for us. We need it to be fast on the initial request. BBox does that for us. Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227008#comment-13227008 ] Bill Bell commented on SOLR-3230: - I am pretty new to Filters and all of this. But it appeared that the SpatialDistanceQuery could just be added to the bbox logic to sequence it. But I could use a code review. {code} if (options.bbox) { spatial.bboxQuery = result; return spatial; } else { result.add(spatial, BooleanClause.Occur.MUST); return result; } } {code} Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227009#comment-13227009 ] Bill Bell commented on SOLR-3230: - I did not add a new filter, since it seems a no brainer to always do the bbox approximation first. Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3230) Performance improvement for geofilt by doing a bbox approximation and then Filter
[ https://issues.apache.org/jira/browse/SOLR-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227011#comment-13227011 ] Bill Bell commented on SOLR-3230: - Ohh. Using exampledate. Score is the distance haversine. http://localhost:8983/solr/select?q={!func}geodist()fl=score,storesort=score ascsfield=stored=348fq={!geofilt}pt=45.19614,-93.90341 -- returns 7 http://localhost:8983/solr/select?q={!func}geodist()fl=score,storesort=score ascsfield=stored=347fq={!geofilt}pt=45.19614,-93.90341 -- returns 6 http://localhost:8983/solr/select?q={!func}geodist()fl=score,storesort=score ascsfield=stored=347fq={!bbox}pt=45.19614,-93.90341 -- returns 7 (since it is not accurate) Performance improvement for geofilt by doing a bbox approximation and then Filter - Key: SOLR-3230 URL: https://issues.apache.org/jira/browse/SOLR-3230 Project: Solr Issue Type: Improvement Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 4.0 Attachments: SOLR-3230.patch This changes {!geofilt} to use a bounding box and then does a accurate filter. See attachment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module
[ https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225051#comment-13225051 ] Bill Bell commented on LUCENE-3795: --- Ryan... Why not just include spatial4j.jar into Lucene/Solr and later separate it. It looks like SIS is not even off the ground yet. In my opinion an API is not really an API until 3 or more projects use it anyways. Why not make JTS pluggable as a separate module into Lucene? Then people can download JTS and add it into the Spatial solution by modifying a config file like solrconfig.xml ? I would love to get this committed and done done. Bill Replace spatial contrib module with LSP's spatial-lucene module --- Key: LUCENE-3795 URL: https://issues.apache.org/jira/browse/LUCENE-3795 Project: Lucene - Java Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 I propose that Lucene's spatial contrib module be replaced with the spatial-lucene module within Lucene Spatial Playground (LSP). LSP has been in development for approximately 1 year by David Smiley, Ryan McKinley, and Chris Male and we feel it is ready. LSP is here: http://code.google.com/p/lucene-spatial-playground/ and the spatial-lucene module is intuitively in svn/trunk/spatial-lucene/. I'll add more comments to prevent the issue description from being too long. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2393) Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)
[ https://issues.apache.org/jira/browse/SOLR-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217909#comment-13217909 ] Bill Bell commented on SOLR-2393: - Done. Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter) Key: SOLR-2393 URL: https://issues.apache.org/jira/browse/SOLR-2393 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Bill Bell Assignee: Jan Høydahl Attachments: SOLR-2393.patch From Ryan McKinley to Jan Høydahl: PhoneticFilterFactory could check DoubleMetaphone.equals( encoder ) and then create the specialized Filter. I don't feel strongly, but we could have: EncoderFilter -- just uses 'encode()' DoubleMetaphoneFilter - uses special double metaphone stuff EncoderFilterFactory -- always uses EncoderFilter, and is not semantically bound to 'phonetic' PhoneticFilterFactory -- picks the best phonetic filter impl (encoder or double metaphone) then deprecate: PhoneticFilter DoubleMetaphoneFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3162) Continue work on new admin UI
[ https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216787#comment-13216787 ] Bill Bell commented on SOLR-3162: - We need to add edismax in the query section. dismax is there... Also, all the edismax parameters. ps,pf,qf, etc. Continue work on new admin UI - Key: SOLR-3162 URL: https://issues.apache.org/jira/browse/SOLR-3162 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Erick Erickson There have been more improvements to how the new UI works, but the current open bugs are getting hard to keep straight. This is the new catch-all JIRA for continued improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3162) Continue work on new admin UI
[ https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216789#comment-13216789 ] Bill Bell commented on SOLR-3162: - Should we put on the front-page HTTP caching is OFF ? Continue work on new admin UI - Key: SOLR-3162 URL: https://issues.apache.org/jira/browse/SOLR-3162 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Erick Erickson There have been more improvements to how the new UI works, but the current open bugs are getting hard to keep straight. This is the new catch-all JIRA for continued improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3360) Move FieldCache to IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216794#comment-13216794 ] Bill Bell commented on LUCENE-3360: --- +1 for FieldCache being an impl of DocValues. Move FieldCache to IndexReader -- Key: LUCENE-3360 URL: https://issues.apache.org/jira/browse/LUCENE-3360 Project: Lucene - Java Issue Type: Improvement Reporter: Martijn van Groningen Fix For: 3.6, 4.0 Attachments: LUCENE-3360-3x.patch, LUCENE-3360-3x.patch, LUCENE-3360-3x.patch, LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch, LUCENE-3360.patch Move the static FieldCache.DEFAULT field instance to atomic IndexReaders, so that FieldCache insanity caused by the WeakHashMap no longer occurs. * Add a new method to IndexReader that by default throws an UOE: {code}public FieldCache getFieldCache(){code} * The SegmentReader implements this method and returns its own internal FieldCache implementation. This implementation just uses a HashMapEntryT,Object to store entries. * The SlowMultiReaderWrapper implements this method as well and basically behaves the same as the current FieldCacheImpl. This issue won't solve the insanity that comes from inconsistent usage of a single field (for example retrieve both int[] and DocTermIndex for the same field). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2393) Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter)
[ https://issues.apache.org/jira/browse/SOLR-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216796#comment-13216796 ] Bill Bell commented on SOLR-2393: - Jan... So we would switch from - {code} filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=false/ or filter class=solr.DoubleMetaphoneFilterFactory inject=false/ {code} to the following... So we would have these 2 cases? Is that how you understand it? {code} fieldtype name=phonetic stored=false indexed=true class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.PhoneticFilterFactory encoder=Phonetic inject=false/ /analyzer /fieldtype fieldtype name=phonetic stored=false indexed=true class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=false/ /analyzer /fieldtype {code} Clean up Double Metaphone code (PhoneticFilterFactory and DoubleMetaphoneFilter) Key: SOLR-2393 URL: https://issues.apache.org/jira/browse/SOLR-2393 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Bill Bell From Ryan McKinley to Jan Høydahl: PhoneticFilterFactory could check DoubleMetaphone.equals( encoder ) and then create the specialized Filter. I don't feel strongly, but we could have: EncoderFilter -- just uses 'encode()' DoubleMetaphoneFilter - uses special double metaphone stuff EncoderFilterFactory -- always uses EncoderFilter, and is not semantically bound to 'phonetic' PhoneticFilterFactory -- picks the best phonetic filter impl (encoder or double metaphone) then deprecate: PhoneticFilter DoubleMetaphoneFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2566) + - operators allow any amount of whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216548#comment-13216548 ] Bill Bell commented on LUCENE-2566: --- +1 to backport to 3x. + - operators allow any amount of whitespace Key: LUCENE-2566 URL: https://issues.apache.org/jira/browse/LUCENE-2566 Project: Lucene - Java Issue Type: Bug Components: core/queryparser Reporter: Yonik Seeley Priority: Minor Fix For: 4.0 Attachments: LUCENE-2566.patch As an example, (foo - bar) is treated like (foo -bar). It seems like for +- to be treated as unary operators, they should be immediately followed by the operand. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module
[ https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210814#comment-13210814 ] Bill Bell commented on LUCENE-3795: --- +1 getting polygon support ootb is huge as is geohashing and multivalued lat longs. Next step Solr... Replace spatial contrib module with LSP's spatial-lucene module --- Key: LUCENE-3795 URL: https://issues.apache.org/jira/browse/LUCENE-3795 Project: Lucene - Java Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 I propose that Lucene's spatial contrib module be replaced with the spatial-lucene module within Lucene Spatial Playground (LSP). LSP has been in development for approximately 1 year by David Smiley, Ryan McKinley, and Chris Male and we feel it is ready. LSP is here: http://code.google.com/p/lucene-spatial-playground/ and the spatial-lucene module is intuitively in svn/trunk/spatial-lucene/. I'll add more comments to prevent the issue description from being too long. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3058) Change default to XSL 2.0 for XSLT TR
[ https://issues.apache.org/jira/browse/SOLR-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192773#comment-13192773 ] Bill Bell commented on SOLR-3058: - Thoughts? Change default to XSL 2.0 for XSLT TR -- Key: SOLR-3058 URL: https://issues.apache.org/jira/browse/SOLR-3058 Project: Solr Issue Type: Improvement Reporter: Bill Bell XSLT 2.0 has been out for a long time. We should change the default SOLR transformer to an XSLT 2.0/XPath 2.0 library. http://www.w3.org/TR/xslt20/ Is there an Apache project that supports this instead of using Saxon ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries
[ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190872#comment-13190872 ] Bill Bell commented on LUCENE-1889: --- It looks like this will automatically be used in SOLR ? OR do we need to add support for this ? FastVectorHighlighter: support for additional queries - Key: LUCENE-1889 URL: https://issues.apache.org/jira/browse/LUCENE-1889 Project: Lucene - Java Issue Type: Wish Components: modules/highlighter Reporter: Robert Muir Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.5, 4.0 Attachments: LUCENE-1889-solr.patch, LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889.patch, LUCENE-1889_reader.patch I am using fastvectorhighlighter for some strange languages and it is working well! One thing i noticed immediately is that many query types are not highlighted (multitermquery, multiphrasequery, etc) Here is one thing Michael M posted in the original ticket: {quote} I think a nice [eventual] model would be if we could simply re-run the scorer on the single document (using InstantiatedIndex maybe, or simply some sort of wrapper on the term vectors which are already a mini-inverted-index for a single doc), but extend the scorer API to tell us the exact term occurrences that participated in a match (which I don't think is exposed today). {quote} Due to strange requirements I am using something similar to this (but specialized to our case). I am doing strange things like forcing multitermqueries to rewrite into boolean queries so they will be highlighted, and flattening multiphrasequeries into boolean or'ed phrasequeries. I do not think these things would be 'fast', but i had a few ideas that might help: * looking at contrib/highlighter, you can support FilteredQuery in flatten() by calling getQuery() right? * maybe as a last resort, try Query.extractTerms() ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190550#comment-13190550 ] Bill Bell commented on LUCENE-3426: --- Is this automatic in SOLR? Or do we need to add a feature to support his in SOLR? optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Trivial Fix For: 3.5, 4.0 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186295#comment-13186295 ] Bill Bell commented on SOLR-2155: - David, We really need to get this into the trunk. What is left to do? Would really love to use this, but until it is moved into the trunk most clients won't. Maybe we can scale it back a bit, add the feature and iterate? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2804) Logging error causes entire DIH process to fail
[ https://issues.apache.org/jira/browse/SOLR-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177517#comment-13177517 ] Bill Bell commented on SOLR-2804: - We are having this issue as well NFO: Wrote last indexed time to C:\solr\jetty\example\solr\provider\conf\dataimport.properties Dec 29, 2011 3:42:58 PM org.apache.solr.common.SolrException log SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at org.apache.solr.common.util.NamedList.getName(NamedList.java:127) at org.apache.solr.common.util.NamedList.toString(NamedList.java:253) at java.lang.String.valueOf(Unknown Source) at java.lang.StringBuilder.append(Unknown Source) at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188) at org.apache.solr.handler.dataimport.SolrWriter.finish(SolrWriter.java:133) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:213) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) All environments are using the same db and config files are the same – so it appears to be problem with SOLR. This happens right at the end of indexing. The fix on Test the other day was to just re-run the index. Logging error causes entire DIH process to fail --- Key: SOLR-2804 URL: https://issues.apache.org/jira/browse/SOLR-2804 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode) Model Name: MacBook Pro Model Identifier: MacBookPro8,2 Processor Name: Intel Core i7 Processor Speed:2.2 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core):256 KB L3 Cache: 6 MB Memory: 4 GB System Software Overview: System Version: Mac OS X 10.6.8 (10K549) Kernel Version: Darwin 10.8.0 Reporter: Pulkit Singhal Labels: dih Original Estimate: 48h Remaining Estimate: 48h SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at org.apache.solr.common.util.NamedList.getName(NamedList.java:127) at org.apache.solr.common.util.NamedList.toString(NamedList.java:263) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188) at org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:57) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:265) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:372) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:421) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org