[Lucene.Net] Re: Creating a ASF fork of Sharpen under a dOCL license
I sent an email to the db4o team to see what they think. When I get a response back from them we should have more answers. At that point it will either be a no on their end or we will have specific items to discuss. Scott On Friday, March 25, 2011, Stefan Bodewig bode...@apache.org wrote: On 2011-03-25, Prescott Nasser wrote: Stefan, how do you read their licensing: http://www.db4o.com/about/company/legalpolicies/docl.aspx By your reading is it possible to include this in our repo to keep everything together? or would this have to be outside the ASF? The usual IANAL disclaimer applies and we could ask for legal clearance if we absolutely think we need it. From a cursory glance I don't think the policy applies to our use-case at all. , | 1. Subject | | Software means the current version of the db4o database engine | software and all patches, bug fixes, error corrections and future | versions. ` AFAIU Sharpen is not part of the database engine. and in addition I'm not sure that a fork of the codebase is in line with what they'd consider a derivative work. Even if it would apply, the license to the original code base was non-transferable and you'd only get the right to sublicense the original code base under the rules of the GPL (section 2b - in addition there is no software at all prior to accepting the agreement). I don't see how this could work. If you really feel that forking Sharpen is the best way to move forward - not my call to make - then forking the GPLed sources into a project that is using the GPL itself seems to be the only choice that was legally sane. Stefan
Re: add an afterFilter to IndexSearcher.search()
On Fri, Mar 25, 2011 at 2:33 PM, Yonik Seeley yo...@lucidimagination.com wrote: Currently, supplying a filter to IndexSearcher.search() assumes that it's cheaper to run than the main query. Wait, where do we assume that? After a match, we always skip on the filter first. Well, we next() on the filter, and advance() on the scorer. Also, why stop at 2 filters? Ie I may have 3 filters plus a query to AND, and I want to control their order. Multiple filters could be combined into a single one via ChainedFilter, etc. True. What's the use case behind this...? Optimizing cases where filters might be more expensive than the main query ;-) That much I understood ;) But you must have a real use case... that inspired this idea? Where are apps/Solr typically using such expensive filters? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011603#comment-13011603 ] Grant Ingersoll commented on SOLR-236: -- Keep in mind an alternative approach that scales, but loses some attributes of this patch (total groups for instance) is committed on trunk and will likely be backported to 3.2. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: Next Attachments: DocSetScoreCollector.java, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RC Status
I ran into a few kinks w/ signing artifacts (it wasn't finding the maven artifacts) in Solr and am fixing them. Once that goes through, I will upload an RC - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2990) Improve ArrayUtil/CollectionUtil.*Sort() methods to early-reaturn on empty or one-element lists/arrays
[ https://issues.apache.org/jira/browse/LUCENE-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2990. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) Renamed local variable from l to size. Committed trunk revision: 1085689 Committed 3.x revision: 1085691 Improve ArrayUtil/CollectionUtil.*Sort() methods to early-reaturn on empty or one-element lists/arrays -- Key: LUCENE-2990 URL: https://issues.apache.org/jira/browse/LUCENE-2990 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Trivial Fix For: 3.2, 4.0 Attachments: LUCENE-2990.patch, LUCENE-2990.patch, LUCENE-2990.patch It might be a good idea to make CollectionUtil or ArrayUtil return early if the passed-in list or array's length = 1 because sorting is unneeded then. This improves maybe automaton or other places, as for empty or one-element lists no SorterTermplate is created. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch
When 3.1 is released, update backwards tests in 3.x branch -- Key: LUCENE-2994 URL: https://issues.apache.org/jira/browse/LUCENE-2994 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler When we have released the official artifacts of Lucene 3.1 (the final ones!!!), we need to do the following: - svn rm backwards/src/test - svn cp https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test backwards/src/test - Copy the lucene-core-3.1.0.jar from the last release tarball to lucene/backwards/lib and delete old one. - Check that everything is correct: The backwards folder should contain a src/ folder that now contains test. The files should be the ones from the branch. - Run ant test-backwards Uwe will take care of this! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch
[ https://issues.apache.org/jira/browse/LUCENE-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2994: -- Affects Version/s: 3.2 Fix Version/s: 3.2 When 3.1 is released, update backwards tests in 3.x branch -- Key: LUCENE-2994 URL: https://issues.apache.org/jira/browse/LUCENE-2994 Project: Lucene - Java Issue Type: Task Affects Versions: 3.2 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2 When we have released the official artifacts of Lucene 3.1 (the final ones!!!), we need to do the following: - svn rm backwards/src/test - svn cp https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test backwards/src/test - Copy the lucene-core-3.1.0.jar from the last release tarball to lucene/backwards/lib and delete old one. - Check that everything is correct: The backwards folder should contain a src/ folder that now contains test. The files should be the ones from the branch. - Run ant test-backwards Uwe will take care of this! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch
[ https://issues.apache.org/jira/browse/LUCENE-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011610#comment-13011610 ] Uwe Schindler commented on LUCENE-2994: --- We have to also clone the 3.1 test-framework, so it's a little bit more work, but it should be easy to do. When 3.1 is released, update backwards tests in 3.x branch -- Key: LUCENE-2994 URL: https://issues.apache.org/jira/browse/LUCENE-2994 Project: Lucene - Java Issue Type: Task Affects Versions: 3.2 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2 When we have released the official artifacts of Lucene 3.1 (the final ones!!!), we need to do the following: - svn rm backwards/src/test - svn cp https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test backwards/src/test - Copy the lucene-core-3.1.0.jar from the last release tarball to lucene/backwards/lib and delete old one. - Check that everything is correct: The backwards folder should contain a src/ folder that now contains test. The files should be the ones from the branch. - Run ant test-backwards Uwe will take care of this! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011613#comment-13011613 ] Robert Muir commented on SOLR-2155: --- I don't really think things like this (queries etc) should go into just Solr, while we leave the lucene-contrib spatial package broken. Lets put things in the right places? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011615#comment-13011615 ] Grant Ingersoll commented on SOLR-2155: --- Yeah, I agree. I haven't looked at the patch yet. It was my understanding that Chris Male was going to move lucene/contrib/spatial to modules and gut the broken stuff in it. I think there is a separate issue open for that one. Presumably, once spatial and function queries are moved to modules, then we will have a properly working spatial package. I obviously can move it, but I don't have time to do the gutting (we really should have deprecated the tier stuff for this release). Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616 ] Robert Muir commented on SOLR-2155: --- well what would the deprecation have suggested as an alternative? Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011619#comment-13011619 ] Chris Male commented on SOLR-2155: -- In LUCENE-2599 I deprecated the spatial contrib. The problem is as Robert raises, deprecating the code without providing an alternative isn't that user friendly. I think as part of this issue we should start up the spatial module and work towards moving what we can there. Moving function queries is going to take some time since they are very coupled to Solr. But that shouldn't preclude us from putting into the module what we can. Once we have a module that provides a reasonable set of functionality, then we can deprecate/gut/remove the spatial contrib. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
Not really related to this issue, so moving to dev@... On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616 ] Robert Muir commented on SOLR-2155: --- well what would the deprecation have suggested as an alternative? It's a good question. The tier stuff, IMO and confirmed by others is broken for most of the world. I sunk a good week into fixing it and was so entangled in the spaghetti that I gave up. What we laid out on another issue (I forget the number, but I think C Male owns it and says he has a rewrite) is to move to modules, keep what we can (geohash and some of the utils) and gut the rest. That combined w/ moving function queries to modules would make all of spatial a good solution for the large majority of users. The only thing that would remain to be back to our current state (at least in terms of features) would be to implement a tier approach. I've proposed the Military Grid System (there is an open JIRA issue for it) as something that looks to be as a good candidate. It's well documented on the web and uses a metric for all distances and has the benefit that all of NATO uses it, albeit for different purposes. It also addresses the poles and the meridians as first class citizens. It just needs an implementer. Having said that, I'm not 100% certain. I also don't know that the tier stuff is absolutely necessary. The combination of what we have in function queries plus trie fields makes for a very fast spatial lookup at this point. I'm totally open to other suggestions, however. Longer term, I've got a lot of ideas for spatial, but that's a different thread. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-1298) FunctionQuery results as pseudo-fields
[ https://issues.apache.org/jira/browse/SOLR-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-1298: -- Assignee: Yonik Seeley (was: Grant Ingersoll) FunctionQuery results as pseudo-fields -- Key: SOLR-1298 URL: https://issues.apache.org/jira/browse/SOLR-1298 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Yonik Seeley Priority: Minor Fix For: Next Attachments: SOLR-1298-FieldValues.patch, SOLR-1298.patch It would be helpful if the results of FunctionQueries could be added as fields to a document. Couple of options here: 1. Run FunctionQuery as part of relevance score and add that piece to the document 2. Run the function (not really a query) during Document/Field retrieval -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Mar 26, 2011, at 8:24 AM, Robert Muir wrote: On Sat, Mar 26, 2011 at 8:06 AM, Grant Ingersoll gsing...@apache.org wrote: Not really related to this issue, so moving to dev@... On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616 ] Robert Muir commented on SOLR-2155: --- well what would the deprecation have suggested as an alternative? It's a good question. The tier stuff, IMO and confirmed by others is broken for most of the world. I sunk a good week into fixing it and was so entangled in the spaghetti that I gave up. What we laid out on another issue (I forget the number, but I think C Male owns it and says he has a rewrite) is to move to modules, keep what we can (geohash and some of the utils) and gut the rest. That combined w/ moving function queries to modules would make all of spatial a good solution for the large majority of users. The only thing that would remain to be back to our current state (at least in terms of features) would be to implement a tier approach. I've proposed the Military Grid System (there is an open JIRA issue for it) as something that looks to be as a good candidate. It's well documented on the web and uses a metric for all distances and has the benefit that all of NATO uses it, albeit for different purposes. It also addresses the poles and the meridians as first class citizens. It just needs an implementer. Having said that, I'm not 100% certain. I also don't know that the tier stuff is absolutely necessary. The combination of what we have in function queries plus trie fields makes for a very fast spatial lookup at this point. I'm totally open to other suggestions, however. Longer term, I've got a lot of ideas for spatial, but that's a different thread. I guess the reason I asked my question is more high-level: on one hand there are suggestions that lucene's spatial package should have been deprecated in 3.1, but on the other hand the very first feature on solr 3.1's new feature list is 'improved geospatial support'. It really should say: Added Geospatial Support, as it was non-existent in Solr before. Most of the work for adding in spatial in Solr consisted of improving things in Solr to make it easy to leverage the one spatial feature we really added: distance based functions and parsing support. Everything else was generally useful things: sorting by function, poly fields, etc. I started on tier support, but dropped it when I realized it was broken beyond repair. The Solr stuff uses, IMO, the stuff in Lucene that works and ignores the rest. I seem to recall Chris had said that once I got done w/ the Solr stuff he would do the modules work, but it hasn't happened yet. I'd say in 3.2, since it sounds like Chris did at least deprecate contrib/spatial, that we work to get all of this resolved: spatial - modules, function queries - modules. Naturally we should do it on trunk, too. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote: On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. I agree it's enough for the contributor to do that, but as committers we need to look at the bigger picture in this particular case, which is the move of spatial to modules. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 9:57 AM, Grant Ingersoll gsing...@apache.org wrote: On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote: On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. I agree it's enough for the contributor to do that, but as committers we need to look at the bigger picture in this particular case, which is the move of spatial to modules. That's a separate asynchronous issue. Progress should not be blocked in Solr in the meantime. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. Its not enough for me: you can expect me to start raising questions and objections when things are committed to the wrong place in the codebase, its totally appropriate. We merged development, all committers can commit to the correct places, there are no excuses. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
I started on tier support, but dropped it when I realized it was broken beyond repair. I did no know one could break code beyond repair Nicolas
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 10:05 AM, Robert Muir rcm...@gmail.com wrote: On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. Its not enough for me: you can expect me to start raising questions and objections when things are committed to the wrong place in the codebase, its totally appropriate. We merged development, all committers can commit to the correct places, there are no excuses. If you're saying Queries don't belong in Solr, I'm a huge -1 on that. There's no correct place for queries in general - it's all in the context. If there's a better place for the query that can be achieved with a mv, then fine. But there's often much more work involved, dependencies on other solr features, or fleshing out a real Java API rather than treating something as simple implementation. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2396) add [ICU]CollationField
[ https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned SOLR-2396: - Assignee: Robert Muir add [ICU]CollationField --- Key: SOLR-2396 URL: https://issues.apache.org/jira/browse/SOLR-2396 Project: Solr Issue Type: Improvement Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch In LUCENE-2551 collation support was changed to use byte[] keys. Previously it encoded sort keys with IndexableBinaryString into char[], but this is wasteful with regards to RAM and disk when terms can be byte. A better solution would be [ICU]CollationFieldTypes, as this would also allow locale-sensitive range queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2396) add [ICU]CollationField
[ https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011646#comment-13011646 ] Robert Muir commented on SOLR-2396: --- I'd like to commit this in a few days if no one objects. The existing encoding is wasteful and I would like to cut solr over to this more efficient one (and enable locale-sensitive range queries). We could open future issues for any additional features such as specifying the icu locale as BCP47, etc, etc. (this just implements the lucene 3.1 functionality more efficiently) add [ICU]CollationField --- Key: SOLR-2396 URL: https://issues.apache.org/jira/browse/SOLR-2396 Project: Solr Issue Type: Improvement Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch In LUCENE-2551 collation support was changed to use byte[] keys. Previously it encoded sort keys with IndexableBinaryString into char[], but this is wasteful with regards to RAM and disk when terms can be byte. A better solution would be [ICU]CollationFieldTypes, as this would also allow locale-sensitive range queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011653#comment-13011653 ] David Smiley commented on SOLR-2155: I plan to finish a couple improvements to this patch within 2 weeks time: distance function queries to work with multi-value, and polygon queries that span the date line. I've been delayed by some life events (new baby). Furthermore, I'll try and ensure that the work here is applicable to pure Lucene users (i.e. sans Solr). One thing I'm unsure of is how to integrate (or not integrate) existing Lucene Solr spatial code with this patch. In this patch I chose to re-use some basic shape classes in Lucene's spatial contrib simply because they were already there, but I could just as easily of not. My preference going forward would be to outright replace Lucene's spatial contrib with this patch. I also think LatLonType and PointType could become deprecated since this patch is not only more capable (multiValue support) but faster too. Well with filtering, sorting is TBD. I'm also inclined to name the field type LatLonGeohashType to re-enforce the fact that it works with lat lon; geohash is an implementation detail. In the future it might even not be geohash, strictly speaking, once we optimize the encoding. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
FYI, I'm working on revamping lucene spatial in general https://lucene-spatial-playground.googlecode.com/svn/trunk/ http://code.google.com/p/lucene-spatial-playground/ These are just sketch APIs for now, but i hope to get them cleaned up and contributed soon. The proposal will be for 3 packages in /modules 1. spatial stuff w/o lucene dependencies -- shapes, distances, etc 2. lucene support for these types 3. solr support for the lucene stuff (4) demo, probably keep this as an external project since UI and demo stuff is much easier on the outside. I hope to migrate the existing spatial stuff to this structure and remove the not-really-working stuff. I'll post more when things are closer to commitable. ryan On Sat, Mar 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote: On Sat, Mar 26, 2011 at 11:03 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. This is an important enough point that I'm going to follow it up with a quote from Mike: The combined dev community would have no requirement/expectation that if someone adds something cool to Lucene they must also expose it in Solr. There will still be devs that wear mostly Solr vs most Lucene hats. There will also be devs that comfortably wear both. There will be devs that focus on analyzers and do amazing things ;) We merged to *enable* moving code around easier, not to mandate it. It is wrong to object to a patch because someone hasn't done extra work with their solr hat on to enable it's use in solr. It is wrong to object to a patch because someone hasn't done extra work with their lucene hat on enable it's use in lucene. With that out of the way, let's get more specific: what Query in this patch should be moved, and to where? No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011661#comment-13011661 ] Ryan McKinley commented on SOLR-2155: - Congratulations on the new baby! Thinking about spatial support in general, I think we should settle on some basic APIs and approaches that can be used across many indexing strategies. In http://code.google.com/p/lucene-spatial-playground/ I'm messing with how we can use a standard API to index Shapes with various strategies. As always, each stratagey has its tradeoffs, but if we can keep the high level APIs similar, that makes choosing the right approach easier. In this project I'm looking at indexing shaps as: * bounding box -- 4 fields xmin/xmax/ymin./ymax * prefix grids -- like geohash or [csquars|http://www.marine.csiro.au/csquares/about-csquares.htm] * in memory spatial index (rtree/quadtree) * raw WKB geometry tokens * points -- x,y fields * etc To keep things coherent, I'm proposing a high level interface like: https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/SpatialQueryBuilder.java And then each implementation fills it in: https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/prefix/PrefixGridQueryBuilder.java This solr to just handle setup and configuration: http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-solr/src/main/java/org/apache/solr/spatial/prefix/SpatialPrefixGridFieldType.java In my view geohash is a subset of 'spatial prefix grid' (is there a real name for this?) -- the interface i'm proposing is: http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-base/src/main/java/org/apache/lucene/spatial/base/prefix/SpatialPrefixGrid.java essentially: {code} public ListCharSequence readCells( Shape geo ); {code} Geohash for a point would just be a list of one token -- for a polygon, it would be a collection of tokens that fill the space like csquares I aim to get this basic structure in a lucene branch and maybe into trunk in the next few weeks Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote: On Sat, Mar 26, 2011 at 11:03 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote: I don't really think things like this (queries etc) should go into just Solr I disagree strongly with the sentiment that queries don't belong in Solr. Everything developed in/for lucene need not be exported to Solr immediately. Everything developed in/for solr need not be exported to Lucene immediately. If the work has been done, and the patch works for Solr, that should be enough. Period. This is an important enough point that I'm going to follow it up with a quote from Mike: The combined dev community would have no requirement/expectation that if someone adds something cool to Lucene they must also expose it in Solr. There will still be devs that wear mostly Solr vs most Lucene hats. There will also be devs that comfortably wear both. There will be devs that focus on analyzers and do amazing things ;) We merged to *enable* moving code around easier, not to mandate it. It is wrong to object to a patch because someone hasn't done extra work with their solr hat on to enable it's use in solr. It is wrong to object to a patch because someone hasn't done extra work with their lucene hat on enable it's use in lucene. With that out of the way, let's get more specific: what Query in this patch should be moved, and to where? No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? There need not be any linkage to lucene to improve a Solr feature. If you disagree, we should vote to clarify - this is too important (and too much of a negative for Solr). -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? There need not be any linkage to lucene to improve a Solr feature. If you disagree, we should vote to clarify - this is too important (and too much of a negative for Solr). I don't think there is *requirement* to move the core spatial stuff to lucene, but I think there is huge benefit to both communities if things have as few dependencies as possible. To be frank, the spatial support in solr is pretty hairy -- it works for some use cases, but is not extendable and quite basic. Calling it 'distance' seems more appropriate then 'spatial' For good spatial support, I think we want to organize things with as few dependencies/assumptions as possible. This will let: * only basic math/geometry -- anything complex should use existing well tested solid frameworks (JTS/proj4/geotools/etc) we should not be reinventing/retesting this stuff. We need basic APIs that will work well with these external tools * lucene focus on fields and queries * solr focus on configuration and external interface This structure and constraints would be a big win for everyone. As always this stuff is hard to talk about in the abstract w/o a real proposal -- of course fixing/improving solr features does not *require* working in lucene-core. But I think we get better solutions when we aim for modular designs with minimum dependencies. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote: No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? There need not be any linkage to lucene to improve a Solr feature. If you disagree, we should vote to clarify - this is too important (and too much of a negative for Solr). I don't think there is *requirement* to move the core spatial stuff to lucene, but I think there is huge benefit to both communities if things have as few dependencies as possible. To be frank, the spatial support in solr is pretty hairy -- it works for some use cases, but is not extendable and quite basic. Calling it 'distance' seems more appropriate then 'spatial' Having something basic that works (and has a clean enough high level HTTP interface) was clearly a win for Solr users. The For good spatial support, I think we want to organize things with as few dependencies/assumptions as possible. This will let: * only basic math/geometry -- anything complex should use existing well tested solid frameworks (JTS/proj4/geotools/etc) we should not be reinventing/retesting this stuff. We need basic APIs that will work well with these external tools * lucene focus on fields and queries * solr focus on configuration and external interface This structure and constraints would be a big win for everyone. As always this stuff is hard to talk about in the abstract w/o a real proposal -- of course fixing/improving solr features does not *require* working in lucene-core. But I think we get better solutions when we aim for modular designs with minimum dependencies. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote: No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? There need not be any linkage to lucene to improve a Solr feature. If you disagree, we should vote to clarify - this is too important (and too much of a negative for Solr). I don't think there is *requirement* to move the core spatial stuff to lucene, but I think there is huge benefit to both communities if things have as few dependencies as possible. To be frank, the spatial support in solr is pretty hairy -- it works for some use cases, but is not extendable and quite basic. Calling it 'distance' seems more appropriate then 'spatial' Having something basic that works (and has a clean enough high level HTTP interface) was clearly a win for Solr users. Of course a more fully featured spatial module would be a win for everyone, but that's ignoring the more generic issue at hand here: a patch that improves Solr's spatial should not be blocked on the grounds that it does not improve Lucene's spatial enough. Likewise, the ridiculous notion that Queries don't belong in Solr needs to be put to rest. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [solr] DataSource for HBase Tables?
Yes, if you are going to use the Data Import Handler, I would say that is the route to go. You might also look at using an abstraction like Gora instead of having a dependency directly on HBase. On Mar 25, 2011, at 4:32 PM, Sterk, Paul (Contractor) wrote: Hi, I have a requirement to use Solr to import data from an HBase table and index the contents – similar to importing data from a RDBMS. It looks like I will need to create an org.apache.solr.handler.dataimport.DataSourceT implementation for HBase to be used by the Data Import Handler. Is this the correct approach? If it is, has someone created a DataSource implementation for HBase? Paul This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you. -- Grant Ingersoll http://www.lucidimagination.com
Re: Interested in GSOC
Thanks for the tips. I'm going through the code and javadocs right now, I will let you know when I have any doubts. I'm not sure to which part of Lucene I'm intending to right a proposal yet, but search/query and query parsing sounds interesting. On Fri, Mar 25, 2011 at 7:15 PM, Adriano Crestani adrianocrest...@gmail.com wrote: Hi Vinicius, Welcome to Lucene! I think a good place to look for internal design documentation is the javadoc package summary. Here is an example: [1], each package usually has its own detailed summary. I hope it helps ;) [1] - http://lucene.apache.org/java/3_0_3/api/contrib-queryparser/org/apache/lucene/queryParser/core/package-summary.html On Fri, Mar 25, 2011 at 4:21 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey there, welcome to Lucene :), good to hear you are interested in Lucene and GSoC! On Fri, Mar 25, 2011 at 4:49 AM, Vinicius Paes de barros viniciuspaesdebar...@yahoo.com.br wrote: Hi there, I heard about GSOC from a friend of mine at college and I decide I want to participate this year. I already used Lucene before, so Lucene sounds like a good place to start. I went through the JIRA projects, but I couldn't find something I feel like writing a proposal to, maybe I don't have enough knowledge yet about how Lucene is implemented internally. So I started looking at the wiki, but I'm not sure whether it contains all the info I need. Is there any other place I should be looking at to learn more about Lucene's internal design? We don't have a lot of design documents and if there are any they might be most likely outdated. I think the best documentation is the code and the people who have written it. If you wanna dive into lucene you should ask as many questions you need to ask and get all the info out of us. We are usually around every day depending on the timezones though so you either go and write emails or you join our IRC channel #lucene on freenode (http://lucene.apache.org/java/docs/irc.html) Is there anything particular that you are interested in like indexing, search, analysis etc? simon Thanks in advance, Vinicius Barros - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2995) factor out a shared spellchecking module
factor out a shared spellchecking module Key: LUCENE-2995 URL: https://issues.apache.org/jira/browse/LUCENE-2995 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 In lucene's contrib we have spellchecking support (index-based spellchecker, directspellchecker, etc). we also have some things like pluggable comparators. In solr we have auto-suggest support (with two implementations it looks like), some good utilities like HighFrequencyDictionary, etc. I think spellchecking is really important... google has upped the ante to what users expect. So I propose we combine all this stuff into a shared modules/spellchecker, which will make it easier to refactor and improve the quality. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2995) factor out a shared spellchecking module
[ https://issues.apache.org/jira/browse/LUCENE-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2995: Attachment: LUCENE-2995.patch Just a quick shot at this (all tests pass). Really any serious 'refactoring' e.g. perf improvements should be on followup issues I think. before applying the patch, run this: {noformat} svn move lucene/contrib/spellchecker modules svn move solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java modules/spellchecker/src/java/org/apache/lucene/search/spell svn move solr/src/java/org/apache/solr/util/TermFreqIterator.java modules/spellchecker/src/java/org/apache/lucene/search/spell svn move solr/src/java/org/apache/solr/util/SortedIterator.java modules/spellchecker/src/java/org/apache/lucene/search/spell svn move solr/src/java/org/apache/solr/spelling/suggest/Suggester.java solr/src/java/org/apache/solr/spelling svn move solr/src/java/org/apache/solr/spelling/suggest modules/spellchecker/src/java/org/apache/lucene/search/spell {noformat} factor out a shared spellchecking module Key: LUCENE-2995 URL: https://issues.apache.org/jira/browse/LUCENE-2995 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2995.patch In lucene's contrib we have spellchecking support (index-based spellchecker, directspellchecker, etc). we also have some things like pluggable comparators. In solr we have auto-suggest support (with two implementations it looks like), some good utilities like HighFrequencyDictionary, etc. I think spellchecking is really important... google has upped the ante to what users expect. So I propose we combine all this stuff into a shared modules/spellchecker, which will make it easier to refactor and improve the quality. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011700#comment-13011700 ] Lance Norskog commented on SOLR-2382: - Have you tested this under threading? DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added unit tests and verified that all existing test cases pass. I believe this
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
Hi, It really should say: Added Geospatial Support, as it was non-existent in Solr before. Most of the work for adding in spatial in Solr consisted of improving things in Solr to make it easy to leverage the one spatial feature we really added: distance based functions and parsing support. Everything else was generally useful things: sorting by function, poly fields, etc. I started on tier support, but dropped it when I realized it was broken beyond repair. The Solr stuff uses, IMO, the stuff in Lucene that works and ignores the rest. I seem to recall Chris had said that once I got done w/ the Solr stuff he would do the modules work, but it hasn't happened yet. I'd say in 3.2, since it sounds like Chris did at least deprecate contrib/spatial, that we work to get all of this resolved: spatial - modules, function queries - modules. Naturally we should do it on trunk, too. Just note that I didn't not do it out of laziness. Actually pushing stuff into the module isn't easy since there isn't much that can be saved from contrib, and Solr's spatial code are predominately bound to function queries, which themselves are very coupled to Solr and that there wasn't anything like a consensus that they should be moved. -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Sun, Mar 27, 2011 at 7:30 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote: No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? There need not be any linkage to lucene to improve a Solr feature. If you disagree, we should vote to clarify - this is too important (and too much of a negative for Solr). I don't think there is *requirement* to move the core spatial stuff to lucene, but I think there is huge benefit to both communities if things have as few dependencies as possible. To be frank, the spatial support in solr is pretty hairy -- it works for some use cases, but is not extendable and quite basic. Calling it 'distance' seems more appropriate then 'spatial' Having something basic that works (and has a clean enough high level HTTP interface) was clearly a win for Solr users. Of course a more fully featured spatial module would be a win for everyone, but that's ignoring the more generic issue at hand here: a patch that improves Solr's spatial should not be blocked on the grounds that it does not improve Lucene's spatial enough. I don't think we need to see it that way, we want to improve both Solr and Lucene's spatial support, not block either. As you say, having a module is a win for everyone, Solr and Lucene alike, so it seems obvious that we should go down that path and the code in SOLR-2155 would make a great first addition. Likewise, the ridiculous notion that Queries don't belong in Solr needs to be put to rest. Issues in and around this seem to be coming up a lot these days (I'm thinking FunctionQuerys too). Sounds like something that really does need to be openly discussed. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
[jira] [Commented] (LUCENE-2995) factor out a shared spellchecking module
[ https://issues.apache.org/jira/browse/LUCENE-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011711#comment-13011711 ] Chris Male commented on LUCENE-2995: +1 factor out a shared spellchecking module Key: LUCENE-2995 URL: https://issues.apache.org/jira/browse/LUCENE-2995 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-2995.patch In lucene's contrib we have spellchecking support (index-based spellchecker, directspellchecker, etc). we also have some things like pluggable comparators. In solr we have auto-suggest support (with two implementations it looks like), some good utilities like HighFrequencyDictionary, etc. I think spellchecking is really important... google has upped the ante to what users expect. So I propose we combine all this stuff into a shared modules/spellchecker, which will make it easier to refactor and improve the quality. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[VOTE] Lucene 3.1.0 RC3
Artifacts are at http://people.apache.org/~gsingers/staging_area/rc3/. Please vote as you see appropriate. Vote closes on March 29th. I've also updated the Release To Do for both Lucene and Solr and it is hopefully a lot easier now to produce the artifacts as more of it is automated (including uploading to staging area). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
3.1.0 Proposed Release Announcement(s)
Proposed Release Announcement (edits welcome). Also note we can have ASF Marketing put out a press release if we want. snip March 2011, Lucene 3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.1 and Apache Solr 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/java and http://www.apache.org/dyn/closer.cgi/lucene/java. See the respective CHANGES.txt file included with the release for a full list of details. Lucene 3.1 Release Highlights * Improved Unicode support, including Unicode 4 * ReusableAnalyzerBase make it easier to reuse TokenStreams correctly * Protected words in stemming via KeywordAttribute * ConstantScoreQuery now allows directly wrapping a Query * Support for custom ExecutorService in ParallelMultiSearcher * IndexWriterConfig.setMaxThreadStates for controls of IndexWriter threads * Numerous performance improvements: faster exact PhraseQuery; natural segment merging favors segments with deletions; primary key lookup is faster; IndexWriter.addIndexes(Directory[]) uses file copy instead of merging; BufferedIndexInput does fewer bounds checks; compound file is dynamically turned off for large segments; fully deleted segments are dropped on commit; faster snowball analyzers (in contrib); ConcurrentMergeScheduler is more careful about setting priority of merge threads. * IndexWriter is now configured with a new separate builder API (IndexWriterConfig). * IndexWriter.getReader is replaced by IndexReader.open(IndexWriter). In addition you can now specify whether deletes should be resolved when you open an NRT reader. * MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher * CharTermAttribute replaces TermAttribute in the Analysis process * On 64bit Windows and Solaris JVMs, MMapDirectory is now the default implementation (returned by FSDirectory.open). MMapDirectory also enables unmapping if the JVM supports it. * New TotalHitCountCollector just counts total number of hits * ReaderFinishedListener API enables external caches to evict entries once a segment is finished Solr 3.1 Release Highlights * Added spatial filtering, boosting and sorting capabilities * Added extend dismax (edismax) query parser which addresses some missing features in the dismax query parser along with some extensions * Several more components now support distributed mode: TermsComponent, SpellCheckComponent * Added an Auto Suggest component * Ability to sort by functions * Support for adding documents using JSON format * Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well as new analysis capabilities * Numerous bug fixes and optimizations. /snip - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
On Mar 26, 2011, at 9:03 PM, Chris Male wrote: Hi, It really should say: Added Geospatial Support, as it was non-existent in Solr before. Most of the work for adding in spatial in Solr consisted of improving things in Solr to make it easy to leverage the one spatial feature we really added: distance based functions and parsing support. Everything else was generally useful things: sorting by function, poly fields, etc. I started on tier support, but dropped it when I realized it was broken beyond repair. The Solr stuff uses, IMO, the stuff in Lucene that works and ignores the rest. I seem to recall Chris had said that once I got done w/ the Solr stuff he would do the modules work, but it hasn't happened yet. I'd say in 3.2, since it sounds like Chris did at least deprecate contrib/spatial, that we work to get all of this resolved: spatial - modules, function queries - modules. Naturally we should do it on trunk, too. Just note that I didn't not do it out of laziness. Actually pushing stuff into the module isn't easy since there isn't much that can be saved from contrib, and Solr's spatial code are predominately bound to function queries, which themselves are very coupled to Solr and that there wasn't anything like a consensus that they should be moved. Agreed, it's not a small task. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.1.0 Proposed Release Announcement(s)
a couple quick suggestions inline: On Sat, Mar 26, 2011 at 10:07 PM, Grant Ingersoll gsing...@apache.org wrote: Proposed Release Announcement (edits welcome). Also note we can have ASF Marketing put out a press release if we want. snip March 2011, Lucene 3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.1 and Apache Solr 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/java and http://www.apache.org/dyn/closer.cgi/lucene/java. See the respective CHANGES.txt file included with the release for a full list of details. Lucene 3.1 Release Highlights * Improved Unicode support, including Unicode 4 * ReusableAnalyzerBase make it easier to reuse TokenStreams correctly * Protected words in stemming via KeywordAttribute I might combine these into 'analysis improvements': improved unicode support, more friendly term handling (CharTermAttribute), easier object reuse (ReusableAnalyzerBase), protected words in stemming (KeywordAttribute) * ConstantScoreQuery now allows directly wrapping a Query * Support for custom ExecutorService in ParallelMultiSearcher I think we should drop this from the release notes, especially given a couple notes down we mention how PMS is deprecated (instead pass the executorservice to indexsearcher). * IndexWriterConfig.setMaxThreadStates for controls of IndexWriter threads * Numerous performance improvements: faster exact PhraseQuery; natural segment merging favors segments with deletions; primary key lookup is faster; IndexWriter.addIndexes(Directory[]) uses file copy instead of merging; BufferedIndexInput does fewer bounds checks; compound file is dynamically turned off for large segments; fully deleted segments are dropped on commit; faster snowball analyzers (in contrib); ConcurrentMergeScheduler is more careful about setting priority of merge threads. we had speedups to mmapdirectory too, but only for large indexes. maybe drop the bufferedindexinput stuff and just say the Directories are faster? I also think we should list the performance improvements as #1 in the list of features (it will encourage users to check out the new release) * IndexWriter is now configured with a new separate builder API (IndexWriterConfig). * IndexWriter.getReader is replaced by IndexReader.open(IndexWriter). In addition you can now specify whether deletes should be resolved when you open an NRT reader. * MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher I think we should re-order the statement somehow, to not emphasize the deprecation first... IndexSearcher gets PMS's capabilities, but without its bugs, and then secondly that PMS is deprecated. * CharTermAttribute replaces TermAttribute in the Analysis process I moved this one into the 'analysis improvements' above. * On 64bit Windows and Solaris JVMs, MMapDirectory is now the default implementation (returned by FSDirectory.open). MMapDirectory also enables unmapping if the JVM supports it. * New TotalHitCountCollector just counts total number of hits * ReaderFinishedListener API enables external caches to evict entries once a segment is finished Solr 3.1 Release Highlights * Added spatial filtering, boosting and sorting capabilities * Added extend dismax (edismax) query parser which addresses some missing features in the dismax query parser along with some extensions * Several more components now support distributed mode: TermsComponent, SpellCheckComponent * Added an Auto Suggest component * Ability to sort by functions * Support for adding documents using JSON format * Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well as new analysis capabilities * Numerous bug fixes and optimizations. /snip - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-trunk - Build # 1511 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1511/ 1 tests failed. FAILED: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1215) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1147) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:519) Build Log (for compile errors): [...truncated 11939 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2979) Simplify configuration API of contrib Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011721#comment-13011721 ] Adriano Crestani commented on LUCENE-2979: -- Hi Phillip, I like your idea, similiar to the one I have, but I was planning to use enum, however, after spent some time thinking, I can't see how I can use generic the way you described only using enum. So go ahead with your idea and create a proposal ;) Don't forget to describe how you plan make the old and new API work together. Simplify configuration API of contrib Query Parser -- Key: LUCENE-2979 URL: https://issues.apache.org/jira/browse/LUCENE-2979 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 2.9, 3.0 Reporter: Adriano Crestani Assignee: Adriano Crestani Labels: api-change, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 3.2, 4.0 The current configuration API is very complicated and inherit the concept used by Attribute API to store token information in token streams. However, the requirements for both (QP config and token stream) are not the same, so they shouldn't be using the same thing. I propose to simplify QP config and make it less scary for people intending to use contrib QP. The task is not difficult, it will just require a lot of code change and figure out the best way to do it. That's why it's a good candidate for a GSoC project. I would like to hear good proposals about how to make the API more friendly and less scaring :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
Maybe I am too close to this issue and not looking at global implications like you are SOLR-2155 seems fairly close to good to go. There are a couple open issues that David Smiley has been asking for input on. I would recommend we answer those questions, and commit it. Then we can look at modules, etc. If a rewrite was on the works, a committer should have said something a LONG time ago. Like back in October or something. Are we talking redesign or refactoring? I think core spatial things should remain in Lucene. Even though Spatial support in 3.1 is basic it is stable, and VERY fast. We ran regression tests and the performance was 10-100x faster than the plugin solution of Solr Spatial from Patrick. Whatever fancy support for polygons, etc we support it needs to be even faster than what we have with 3.1. What I like about this patch more than anything is the support for multiple-lat longs per document. I have several clients who need this feature. For example, one doctor with multiple offices. It would be nice if a committer would work with David Smiley to get this done. 1. We would like to know if the pole issue can be solved and how. 2. We would like to know the best way to support the multi lat long (without the copy happening) and get the values from multigeodist(). I have pushed up a good example, and I would like someone to please comment and maybe even show me some code to do that. There has been some discussions on this issue - my solution uses VS and it is fast. There might be faster and more simple ways to handle the N number of points. On another note, it is frustrating when David and I put in some time on this, and it sits out there with us begging for a committer to assist, and then when Grant starts discussions, it is summarility discarded with a new design without any input from the original contributors? Is this how we want to do things here? We should have Grant or Yonik work with David to get this patch done, Then we can discuss Spatial V2 and the design of it. Bill On Sat, Mar 26, 2011 at 7:19 PM, Chris Male gento...@gmail.com wrote: On Sun, Mar 27, 2011 at 7:30 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote: No, the question is: what justification is there for adding spatial support to solr-only, leaving lucene with a broken contrib module, versus adding it where it belongs and exposing it to solr? There need not be any linkage to lucene to improve a Solr feature. If you disagree, we should vote to clarify - this is too important (and too much of a negative for Solr). I don't think there is *requirement* to move the core spatial stuff to lucene, but I think there is huge benefit to both communities if things have as few dependencies as possible. To be frank, the spatial support in solr is pretty hairy -- it works for some use cases, but is not extendable and quite basic. Calling it 'distance' seems more appropriate then 'spatial' Having something basic that works (and has a clean enough high level HTTP interface) was clearly a win for Solr users. Of course a more fully featured spatial module would be a win for everyone, but that's ignoring the more generic issue at hand here: a patch that improves Solr's spatial should not be blocked on the grounds that it does not improve Lucene's spatial enough. I don't think we need to see it that way, we want to improve both Solr and Lucene's spatial support, not block either. As you say, having a module is a win for everyone, Solr and Lucene alike, so it seems obvious that we should go down that path and the code in SOLR-2155 would make a great first addition. Likewise, the ridiculous notion that Queries don't belong in Solr needs to be put to rest. Issues in and around this seem to be coming up a lot these days (I'm thinking FunctionQuerys too). Sounds like something that really does need to be openly discussed. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Software Developer | JTeam BV.| www.jteam.nl - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011730#comment-13011730 ] James Dyer commented on SOLR-2382: -- There is a multi-threaded unit test in TestDihCacheWriterAndProcessor.java. However, I have not used the threads param in a real-world setting. DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to