[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789743#action_12789743 ] Grant Ingersoll commented on SOLR-773: -- Just an update: # SOLR-1131: aka poly fields is almost ready to go. Please review. # SOLR-1297: sort by function query just needs review and then can be committed. After that, we can add in the Cartesian Tier indexing and the Cartesian Tier QParserPlugin (after a little re-write). Then we need pseudo-fields and we likely want to hook in a per request function cache (maybe) Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1650) Consider being able to cache function results per request
Consider being able to cache function results per request - Key: SOLR-1650 URL: https://issues.apache.org/jira/browse/SOLR-1650 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 1.5 Once we can sort, filter and boost by functions, it may be the case that the same function is executed for the same value over and over again. Consider ways to cache this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch Missing an in DistanceUtils.parsePoint Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1650) Consider being able to cache function results per request
[ https://issues.apache.org/jira/browse/SOLR-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789804#action_12789804 ] Grant Ingersoll commented on SOLR-1650: --- I was thinking of a cache whose scope was the length of the request. The basic use case is: 1. Filter by distance 2. Boost/Sort by distance 3. Facet by distance Of course, this could feed the pseudo fields, too. Consider being able to cache function results per request - Key: SOLR-1650 URL: https://issues.apache.org/jira/browse/SOLR-1650 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 1.5 Once we can sort, filter and boost by functions, it may be the case that the same function is executed for the same value over and over again. Consider ways to cache this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-1297. --- Resolution: Fixed Committed revision 889997. Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1297.patch It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789240#action_12789240 ] Grant Ingersoll commented on SOLR-1131: --- {quote} Quick comment based on a spot check of the changes to IndexSchema: rather than make polyField special somehow w.r.t IndexSchema, and add a FieldType.getPolyFieldNames, etc, I had been thinking more along the lines of having an IndexSchema.registerDynamicFieldDefinition - just like the existing registerDynamicCopyField. This would (optionally) allow any field type to add other definitions to the IndexSchema. I continue to think it would be good to stay away of special logic for polyfields in the IndexSchema. {quote} So, then the FieldType would register it's Dynamic Fields in it's own init() method by calling this method? That can work. bq. Why use Dynamic Field array The array is sorted and array access is much faster and we often have to loop over it to look it up. {quote} CoordinateFieldType: why process 1 sub field types and then throw an exception at the end? I cleaned this up to throw the Exception when it occurs. {quote} OK. Actually, this should just be in the derived class, as it may be the case some other CoordinateFieldType has multiple sub types. {quote} # parsePoint in DistanceUtils, why use ',' as the separator - use ' ' (at least conforms to georss point then). I guess because you are supporting N-dimensional points, right? # parsePoint - instead of complicated isolation loops, why not just use trim()? I've taken that approach in the patch I've attached. {quote} I think comma makes sense. As for the optimization stuff, I agree w/ Yonik, this is code that will be called a lot. * when checking for isDuplicateDynField, if it is, nothing is done. Shouldn't this be where an exception is thrown or a message is logged? In the patch I'm attaching I took the log approach. {quote} It is logged, but for the poly fields, if the dyn field is already defined, that's just fine. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789241#action_12789241 ] Grant Ingersoll commented on SOLR-1131: --- I've got a patch almost ready that brings in the ValueSource stuff. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789339#action_12789339 ] Grant Ingersoll commented on SOLR-1131: --- {quote} I'm wondering what there is to agree with since optimization was never defined. Are you talking speed? Are you talking memory efficiency? Code readability? Maintainability? Some combination of all of those? {quote} Speed and memory. As for logging, that code is all going away in the next patch, I think Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789351#action_12789351 ] Grant Ingersoll commented on SOLR-1131: --- bq. Unfortunately, it's at the cost of readability and maintainability. Maybe. It took me all of 30 seconds to figure out what it was doing. I'll put some comments on it. While readability is important, Solr's goal is not to make a product that a CS101 grad can read, it's too build a blazing fast search server. That call could hit millions of times when indexing points. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch OK, this is getting a lot closer to ready to commit. Changes: # Introduced a MultiValueSource - ValueSource that abstractly represents ValueSources for poly fields, and other things. # Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps other value sources (could be called something else, I suppose) # Implemented PointTypeValueSource to represent ValueSource for the PointType class. # Hooked in multivalue callbacks to DocValues. In addition to making functions work with Points (et. al) it should be possible to write functions that work on multivalued fields, but I did not undertake this work. # Add in SchemaAware callback mechanism so that Field Types and other schema stuff can register dynamic fields, etc. after the schema has been created # Updated the example to have spatial information in the docs, etc. See http://wiki.apache.org/solr/SpatialSearch # Modified the distance functions to work with MultiValueSources # cleaned up the tests # Incorporated various comments from Chris and Yonik. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789526#action_12789526 ] Grant Ingersoll commented on SOLR-1131: --- I think this is ready to commit. I'd like to do so on Monday or Tuesday of next week, so that should give plenty of time for further review Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-1297: - Assignee: Grant Ingersoll Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on SOLR-1297 started by Grant Ingersoll. Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789551#action_12789551 ] Grant Ingersoll commented on SOLR-1297: --- For this, I think we want to be able to do things like: Just functions {code} sort=dist(2,x,y, point(0,0)) desc {code} Multiple sort params, some functions, some fields {code} sort=weight asc,dist(2,x,y, point(0,0)) asc {code} If and when a function result cache exists, we should be able to take advantage of that too, but that is an implementation detail. Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788743#action_12788743 ] Grant Ingersoll commented on LUCENE-1377: - bq. Really? we don't add any analysis capabilities to lucene that solr uses too? Yes, but Solr has a dependency on Lucene, not the other way around. Solr is, by definition, at a higher level than Lucene. In order for Lucene to use something of Solr's, it has to, essentially, fork the code. It's happened several times where stuff was pulled out of Solr and put in Lucene, but then Solr wasn't updated to use it or it was updated due to Solr undertaking a fair amount of work to then use the exact same feature it had in it's own code base that Lucene then added. Since Solr has the dep. on Lucene, it's natural Solr takes advantage of what Lucene has to offer, just like any other project that uses Lucene. Like I said, though, it may make sense for analysis to be separate. I was just pointing out it is a slippery slope. Add HTMLStripReader and WordDelimiterFilter from SOLR - Key: LUCENE-1377 URL: https://issues.apache.org/jira/browse/LUCENE-1377 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.3.2 Reporter: Jason Rutherglen Priority: Minor Original Estimate: 24h Remaining Estimate: 24h SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very useful for a wide variety of use cases. It would be good to place them into core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788745#action_12788745 ] Grant Ingersoll commented on LUCENE-1377: - bq. i think we want to remove, not create duplication. I think we all agree on that. Alas, though, the devil is in the details and that's where it always seems to get hung up. Not saying it can't work (I've often been an advocate for it), just saying we've gone around on this a number of times and I think it gets hung up on the fact that the two communities are fairly independent with the exception of a few core committers. Add HTMLStripReader and WordDelimiterFilter from SOLR - Key: LUCENE-1377 URL: https://issues.apache.org/jira/browse/LUCENE-1377 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.3.2 Reporter: Jason Rutherglen Priority: Minor Original Estimate: 24h Remaining Estimate: 24h SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very useful for a wide variety of use cases. It would be good to place them into core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788758#action_12788758 ] Grant Ingersoll commented on LUCENE-1377: - bq. one thing we could do, is to just have a general rule that we should not copy stuff like this and instead it should be moved, with tests and back compat and all that working on both projects. Yeah, and we can probably take this on a more case by case basis, but it still is creating extra work for Solr committers w/ very little benefit to the project. Not a big deal for the analyzers stuff, since Solr has that process mostly automated anyway, but may be a bigger issue for other stuff. So, if we go with Mike's proposal and make Lucene core have a dep on a new Analyzers module, then this could work, but even that I'm not sure about, as Solr is not on Lucene 3.x yet (and doesn't have immediate plans for it either). At any rate, let's get concrete w/ a patch. Add HTMLStripReader and WordDelimiterFilter from SOLR - Key: LUCENE-1377 URL: https://issues.apache.org/jira/browse/LUCENE-1377 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.3.2 Reporter: Jason Rutherglen Priority: Minor Original Estimate: 24h Remaining Estimate: 24h SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very useful for a wide variety of use cases. It would be good to place them into core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788751#action_12788751 ] Grant Ingersoll commented on SOLR-1131: --- {quote} Seems like SolrQueryParser should use getFieldQuery for everything (except TextField... but it could even be used for that if we make it such that we could call back to getBooleanQuery, etc). I had this in my patch. {quote} Yonik, could you elaborate on this? It seems kind of weird to have that instanceof check in SolrQueryParser.getFieldQuery() to see if we have a TextField or not. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch This implements Option B as laid out at: http://search.lucidimagination.com/search/document/83a5442ab155686/solr_1131_multiple_fields_per_field_type#a600de441418a798 Next up: Implement ValueSource support for PointType. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788805#action_12788805 ] Grant Ingersoll commented on SOLR-1131: --- bq. DocValues.getPoint(double[] point) OK, let me see how that plays out. See also http://www.lucidimagination.com/search/document/fd804bcd78d7bec1/solr_1131_poly_fields_and_valuesource Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788121#action_12788121 ] Grant Ingersoll commented on SOLR-1131: --- bq. IMO, unit tests can be too low level. They can also be too fragile. I guess it all comes down to what you call a unit. bq. t would be nice, for example, if testPointFieldType indexed a few couments (with various combinations of stored / indexed) and then queried the index, This is done in testIndexing() bq. TopDocs topDocs = core.getSearcher().get().search(bq, 1); Yeah, see my comment there even! I wanted a way to validate that the correct query is created, but I don't even really need to run a search for that. {quote} Rendundant null checks, trivial strings, etc: + assertNotNull(topDocs is null, topDocs); + assertTrue(topDocs.totalHits + does not equal: + 1, topDocs.totalHits == 1) {quote} I need to update my IntelliJ Live Templates, as I have them setup to spit out a pattern like above bq. Please see the DocumentBuilder changes I had added... Will do. {quote} Seems like SolrQueryParser should use getFieldQuery for everything (except TextField... but it could even be used for that if we make it such that we could call back to getBooleanQuery, etc). I had this in my patch. {quote} I thought I captured that, but will look again. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788123#action_12788123 ] Grant Ingersoll commented on SOLR-1131: --- {quote} Regarding polyfields... it's not clear why they are special enough to have to change the IndexSchema? (IndexSchema.isPolyField, getPolyField, getPolyFieldType, getPolyFieldTypeNoEx, etc). Can't we just store them as normal field types? {quote} My thinking was that a Query Parser or other things might need to know look up this information, but you are right, I don't have a specific use case for them at the moment. At the same time, poly fields _feel_ like a hybrid between regular fields and dynamic fields and thus fit at the same level they do. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788151#action_12788151 ] Grant Ingersoll commented on SOLR-1131: --- bq. What if, instead, dynamic fields are directly used for subfields? That then requires those dynamic fields to be present, which I'd rather not have to do. Part of the goal of this issue is to hide the implementation. Having said that, I still don't know whether that means I need to keep the IndexSchema changes. Let me do another iteration. bq. Another thing to keep in mind - not all subfields will always be of the same type. Agreed, but I don't think this is baked in to the generic capabilities, just the Point stuff, where I think it is fine to have the same sub-type. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788163#action_12788163 ] Grant Ingersoll commented on SOLR-1131: --- {quote} Also, you have subFieldType=double in the schema... and that requires that the double field type be defined. Why not have subFieldSuffix=_d and require the _d dynamic field be defined? Seems like the same complexity level {quote} I think it makes more sense for the subFieldType to be present to be tied to a type than a Field (subFieldSuffix), as it seems weird to have a field type have a dependency on a Field, whereas it seems fine for a field type to have a dependency on another field type. {quote} For a specific point implementation, that's fine. But if you use a point type that can do cartesian grid stuff, then you already have different field types. But I guess subFieldType=double need only apply to some of the subfields (the ones that index the points). {quote} I'm not sure I see this. If and when we implement CartesianPointType, it will still need to have a type for the sub fields (depending on the tiers specified) but I don't see why the subFieldType wouldn't be the same for all of them. AIUI, they all have the same precision requirements. I think part of what's missing is that for some of these attributes, it would be better for them to be field properties and not fieldType properties. For instance for the Cartesian case, you will need to declare what levels to support. If that is specified on the FieldType, then you have a proliferation of Field Type declarations, whereas if it is on the Field, that is a lot cleaner and less verbose. I'm just not sure how that gets implemented just yet, as having to specify startTier and endTier doesn't seem like the same level as multiValued or stored. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788170#action_12788170 ] Grant Ingersoll edited comment on SOLR-1131 at 12/9/09 5:18 PM: {quote} fieldType name=xy class=solr.PointType dimension=2 subFieldType=double/ field name=home type=xy indexed=true stored=true/ {quote} Two indexed fields home___0 home___1 One stored field: home was (Author: gsingers): {quote} fieldType name=xy class=solr.PointType dimension=2 subFieldType=double/ field name=home type=xy indexed=true stored=true/ {quote} home___0 home___1 Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788298#action_12788298 ] Grant Ingersoll commented on SOLR-1131: --- bq. OK... so the real issue is that this introduces a new mechanism to look up field types... not necessarily a horrible thing, but we should definitely think twice before doing so. Agreed. I'm not wedded to this approach, just want to see the discussion through. I do feel strongly that the goal is such that an app designer should be able to use a FieldType just as they always have, either dynamic or static. How we get to that I don't care so much as long as it works and performs. bq. But... that scheme seems to limit us to a single subField type (in addition to the other downsides of requiring a new lookup mechanism). I don't follow this. In this particular implementation, I have a single subFieldType, but I don't see why a different implementation couldn't do something like: {code} fieldType name=foo type=solr.MultiSubPointType dimension=3 subFieldTypes=double,tdouble,int/ {code} bq. Aside: it looks like the code for getFieldOrNull isn't right? Seems like it will return a field with both the wrong type and the wrong name? Hmmm, I _think_ it should return the owning Schema Field, i.e. the one that exists in the schema.xml file. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788307#action_12788307 ] Grant Ingersoll commented on SOLR-1131: --- Note, I don't think the distance function queries will work w/ my patch yet. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENE-2127) Improved large result handling
Improved large result handling -- Key: LUCENE-2127 URL: https://issues.apache.org/jira/browse/LUCENE-2127 Project: Lucene - Java Issue Type: New Feature Reporter: Grant Ingersoll Priority: Minor Per http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5, it would be nice to offer some other Collectors that are better at handling really large number of results. This could be implemented in a variety of ways via Collectors. For instance, we could have a raw collector that does no sorting and just returns the ScoreDocs, or we could do as Mike suggests and have Collectors that have heuristics about memory tradeoffs and only heapify when appropriate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch OK, here's my take on this. I took Yonik's and merged it w/ a patch I had in the works. It's not done, but all tests pass, including the new on I added (PolyFieldTest). Yonik's move to put getFieldQuery in FieldType was just the key to answering the question of how to generate queries given a FieldType. Notes: 1. I changed the Geo examples to be CoordinateFieldType (representing an abstract coordinate system) and then PointFieldType which represents a point in an n-dimensional space (default 2D). I think from this, we could easily add things like PolygonFieldType, etc. which would allow us to create more sophisticated shapes and do things like intersections, etc. For instance, imagine saying: Does this point lie within this shape? I think that might be able to be expressed as a RangeQuery 2. I'm not sure I care for the name of the new abstract FieldType that is a base class of CoordinateFieldType called DelegatingFieldType 3. I'm not sure yet on the properties of the generated fields just yet. Right now, I'm delegating the handling to the sub FieldType except I'm overriding to turn off storage, which I think is pretty cool (could even work as a copy field like functionality) 4. I'm not thrilled about creating a SchemaField every time in the createFields protected helper method, but SchemaField is final and doesn't have a setName method (which makes sense) Questions for Yonik on his patch: 1. Why is TextField overriding getFieldQuery when it isn't called, except possibly via the FieldQParserPlugin? 2. I'm not sure I understand the getDistance, getBoundingBox methods on the GeoFieldType. It seems like that precludes one from picking a specific distance (for instance, some times you may want a faster approx. and others a slower) Needs: 1. Write up changes.txt 2. More tests, including performance testing 3. Patch doesn't support dynamic fields yet, but it should Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787113#action_12787113 ] Grant Ingersoll commented on SOLR-1586: --- FYI, see the SOLR-1131 for an implementation of a Point Field Type. Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: examplegeopointdoc.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787115#action_12787115 ] Grant Ingersoll commented on SOLR-1586: --- bq. we should have the ability to output those fields as georss per ryan's suggestion Ryan can correct me if I am putting words in his mouth, but I don't think he literally meant we needed to use those exact tags. I think he just meant the format of the actual values. Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: examplegeopointdoc.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787198#action_12787198 ] Grant Ingersoll commented on SOLR-1586: --- Can you put a patch containing just the geohash stuff? Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: examplegeopointdoc.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786586#action_12786586 ] Grant Ingersoll commented on SOLR-1131: --- Hey Yonik, One of the things I was debating was whether it was worthwhile to keep the single field creation or not. I see in your patch you drop it. I've got a patch that keeps it. I will try to put it up this week. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786164#action_12786164 ] Grant Ingersoll commented on LUCENE-2091: - I haven't looked at the patch yet, but... Should we take just a small step back and consider what it would take to actually make scoring more pluggable instead of just thinking about how best to integrate BM25? In other words, someone else has also donated an implementation of the Axiomatic Retr. Function. Much like BM25, I think it also requires avg. doc length, as does (I believe) language modeling and some other approaches. Of course, we need to do this in a way that doesn't hurt performance for the default case. I'm also curious if anyone has compared BM25 w/ a Lucene similarity that uses a different length normalization factor? I've seen many people use a different len. norm with good success, but it isn't necessarily for everyone. Add BM25 Scoring to Lucene -- Key: LUCENE-2091 URL: https://issues.apache.org/jira/browse/LUCENE-2091 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Yuval Feinstein Priority: Minor Fix For: 3.1 Attachments: LUCENE-2091.patch, persianlucene.jpg Original Estimate: 48h Remaining Estimate: 48h http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework, as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF). I have refactored this a bit, added unit tests and improved the runtime somewhat. I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786220#action_12786220 ] Grant Ingersoll commented on LUCENE-2091: - bq. Yes in the image posted here, I tried modifying length normalization with SweetSpot etc as others have done in the past. For this corpus I was unable to improve it in this way. Yeah, can't speak for SweetSpot, but there are other approaches too that don't favor shorter docs all the time. Add BM25 Scoring to Lucene -- Key: LUCENE-2091 URL: https://issues.apache.org/jira/browse/LUCENE-2091 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Yuval Feinstein Priority: Minor Fix For: 3.1 Attachments: LUCENE-2091.patch, persianlucene.jpg Original Estimate: 48h Remaining Estimate: 48h http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework, as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF). I have refactored this a bit, added unit tests and improved the runtime somewhat. I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (SOLR-1622) Add aggregate Math capabilities to Solr above and beyond the StatsComponent
Add aggregate Math capabilities to Solr above and beyond the StatsComponent --- Key: SOLR-1622 URL: https://issues.apache.org/jira/browse/SOLR-1622 Project: Solr Issue Type: New Feature Components: search Reporter: Grant Ingersoll Priority: Minor It would be really cool if we could have a QueryComponent that enabled doing aggregating calculations on search results similar to what the StatsComponent does, but in a more generic way. I also think it makes sense to reuse some of the function query capabilities (like the parser, etc.). I imagine the interface might look like: {code} math=truefunc=recip(sum(A)) {code} This would calculate the reciprocal of the sum of the values in the field A. Then, you could do go across fields, too {code} math=truefunc=recip(sum(A, B, C)) {code} Which would sum the values across fields A, B and C. It is important to make the functions pluggable and reusable. Might be also nice to see if we can share the core calculations between function queries and this capability such that if someone adds a new aggregating function, it can also be used as a new Function query. Of course, we'd want plugin functions, too, so that people can plugin their own functions. After this is implemented, I think StatsComponent becomes a derivative of the new MathComponent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784705#action_12784705 ] Grant Ingersoll commented on SOLR-1277: --- Mark, I think this makes sense. I think you can grab my ZK admin ReqHandler and the shards refactoring, too and pull that into this patch, as most of them are independent of the actual startup/config part. If you don't get to it, I will try to next week. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784948#action_12784948 ] Grant Ingersoll commented on SOLR-1277: --- bq. Is there a benefit to refactoring the shards piece to a component rather than a simple helper class or something? Yes. For starters, not all requests require the QueryComponent, but still may require distributed (TermsComponent) caps. Second, I think it is cleaner and allows others to plugin/override with their own capabilities. Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782898#action_12782898 ] Grant Ingersoll commented on MAHOUT-204: +1 on aggressive pruning and cleanup. Feel free to commit as you go, too. No need to get it all done in one fell swoop. Better integration of Mahout matrix capabilities with Colt Matrix additions --- Key: MAHOUT-204 URL: https://issues.apache.org/jira/browse/MAHOUT-204 Project: Mahout Issue Type: Improvement Affects Versions: 0.3 Reporter: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-204-author-cleanup.patch Per MAHOUT-165, we need to refactor the matrix package structures a bit to be more coherent and clean. For instance, there are two levels of matrix packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781896#action_12781896 ] Grant Ingersoll commented on MAHOUT-204: Yeah, go ahead and submit the patch, then do the formatting. Better integration of Mahout matrix capabilities with Colt Matrix additions --- Key: MAHOUT-204 URL: https://issues.apache.org/jira/browse/MAHOUT-204 Project: Mahout Issue Type: Improvement Affects Versions: 0.3 Reporter: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-204-author-cleanup.patch Per MAHOUT-165, we need to refactor the matrix package structures a bit to be more coherent and clean. For instance, there are two levels of matrix packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-207: -- Assignee: Grant Ingersoll AbstractVector.hashCode() should not care about the order of iteration over elements Key: MAHOUT-207 URL: https://issues.apache.org/jira/browse/MAHOUT-207 Project: Mahout Issue Type: Bug Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Assignee: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-207.patch As was discussed in MAHOUT-165, hashCode can be implemented simply like this: {code} public int hashCode() { final int prime = 31; int result = prime + ((name == null) ? 0 : name.hashCode()); result = prime * result + size(); IteratorElement iter = iterateNonZero(); while (iter.hasNext()) { Element ele = iter.next(); long v = Double.doubleToLongBits(ele.get()); result += (ele.index() * (int)(v^(v32))); } return result; } {code} which obviates the need to sort the elements in the case of a random access hash-based implementation. Also, (ele.index() * (int)(v^(v32)) ) == 0 when v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse vectors which have zero elements returned from the iterateNonZero() iterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782041#action_12782041 ] Grant Ingersoll commented on MAHOUT-207: How does this all relate to https://issues.apache.org/jira/browse/MAHOUT-159? AbstractVector.hashCode() should not care about the order of iteration over elements Key: MAHOUT-207 URL: https://issues.apache.org/jira/browse/MAHOUT-207 Project: Mahout Issue Type: Bug Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Fix For: 0.3 Attachments: MAHOUT-207.patch As was discussed in MAHOUT-165, hashCode can be implemented simply like this: {code} public int hashCode() { final int prime = 31; int result = prime + ((name == null) ? 0 : name.hashCode()); result = prime * result + size(); IteratorElement iter = iterateNonZero(); while (iter.hasNext()) { Element ele = iter.next(); long v = Double.doubleToLongBits(ele.get()); result += (ele.index() * (int)(v^(v32))); } return result; } {code} which obviates the need to sort the elements in the case of a random access hash-based implementation. Also, (ele.index() * (int)(v^(v32)) ) == 0 when v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse vectors which have zero elements returned from the iterateNonZero() iterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAHOUT-206) Separate and clearly label different SparseVector implementations
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-206: -- Assignee: Grant Ingersoll Separate and clearly label different SparseVector implementations - Key: MAHOUT-206 URL: https://issues.apache.org/jira/browse/MAHOUT-206 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Assignee: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-206.patch Shashi's last patch on MAHOUT-165 swapped out the int/double parallel array impl of SparseVector for an OpenIntDoubleMap (hash-based) one. We actually need both, as I think I've mentioned a gazillion times. There was a patch, long ago, on MAHOUT-165, in which Ted had OrderedIntDoubleVector, and OpenIntDoubleHashVector (or something to that effect), and neither of them are called SparseVector. I like this, because it forces people to choose what kind of SparseVector they want (and they should: sparse is an optimization, and the client should make a conscious decision what they're optimizing for). We could call them RandomAccessSparseVector and SequentialAccessSparseVector, to be really obvious. But really, the important part is we have both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782054#action_12782054 ] Grant Ingersoll commented on MAHOUT-206: Jake, there's something weird in this patch in regards to SparseVector. It didn't delete the file, but instead left it empty. It seems like there is still some commonality between the two implementations (size, cardinality, etc.) that I think it would be worthwhile to keep SparseVector as an abstract class which the other two extend. Separate and clearly label different SparseVector implementations - Key: MAHOUT-206 URL: https://issues.apache.org/jira/browse/MAHOUT-206 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Assignee: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-206.patch Shashi's last patch on MAHOUT-165 swapped out the int/double parallel array impl of SparseVector for an OpenIntDoubleMap (hash-based) one. We actually need both, as I think I've mentioned a gazillion times. There was a patch, long ago, on MAHOUT-165, in which Ted had OrderedIntDoubleVector, and OpenIntDoubleHashVector (or something to that effect), and neither of them are called SparseVector. I like this, because it forces people to choose what kind of SparseVector they want (and they should: sparse is an optimization, and the client should make a conscious decision what they're optimizing for). We could call them RandomAccessSparseVector and SequentialAccessSparseVector, to be really obvious. But really, the important part is we have both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782064#action_12782064 ] Grant Ingersoll commented on MAHOUT-207: Aren't we loosing some of the benefits of SparseVector with this explicit set to zero stuff (by having to call equivalent)? I've wondered in the past how a Sparse implementation should handle something like setQuick(i, 0). One approach is to set it, but the other is to ignore it and possibly remove any previous nonzero entry, right? Seems like tradeoffs w/ both. AbstractVector.hashCode() should not care about the order of iteration over elements Key: MAHOUT-207 URL: https://issues.apache.org/jira/browse/MAHOUT-207 Project: Mahout Issue Type: Bug Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Assignee: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-207.patch As was discussed in MAHOUT-165, hashCode can be implemented simply like this: {code} public int hashCode() { final int prime = 31; int result = prime + ((name == null) ? 0 : name.hashCode()); result = prime * result + size(); IteratorElement iter = iterateNonZero(); while (iter.hasNext()) { Element ele = iter.next(); long v = Double.doubleToLongBits(ele.get()); result += (ele.index() * (int)(v^(v32))); } return result; } {code} which obviates the need to sort the elements in the case of a random access hash-based implementation. Also, (ele.index() * (int)(v^(v32)) ) == 0 when v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse vectors which have zero elements returned from the iterateNonZero() iterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782080#action_12782080 ] Grant Ingersoll commented on MAHOUT-207: All makes sense. Per the refactoring in MAHOUT-206, I think this argues even more for an abstract SparseVector implementation that can handle some of the common code. AbstractVector.hashCode() should not care about the order of iteration over elements Key: MAHOUT-207 URL: https://issues.apache.org/jira/browse/MAHOUT-207 Project: Mahout Issue Type: Bug Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Assignee: Grant Ingersoll Fix For: 0.3 Attachments: MAHOUT-207.patch As was discussed in MAHOUT-165, hashCode can be implemented simply like this: {code} public int hashCode() { final int prime = 31; int result = prime + ((name == null) ? 0 : name.hashCode()); result = prime * result + size(); IteratorElement iter = iterateNonZero(); while (iter.hasNext()) { Element ele = iter.next(); long v = Double.doubleToLongBits(ele.get()); result += (ele.index() * (int)(v^(v32))); } return result; } {code} which obviates the need to sort the elements in the case of a random access hash-based implementation. Also, (ele.index() * (int)(v^(v32)) ) == 0 when v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse vectors which have zero elements returned from the iterateNonZero() iterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Fix Version/s: 1.5 Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781945#action_12781945 ] Grant Ingersoll commented on SOLR-1586: --- Hey Chris, I'm not sure we want to bring in the actual namespace for georss. That seems like overkill, but I'm open to hear what others think. Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: examplegeopointdoc.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781948#action_12781948 ] Grant Ingersoll commented on SOLR-1131: --- See discussion at http://search.lucidimagination.com/search/document/d24c920ddf05b4f7/solr_1131_multiple_fields_per_field_type Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781973#action_12781973 ] Grant Ingersoll commented on SOLR-1586: --- Also, where does this patch actually encode the Geohash value? The Lucene spatial contrib JAR has GeoHashUtils for just this. See the GeohashFunction for usage. Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: examplegeopointdoc.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781427#action_12781427 ] Grant Ingersoll commented on MAHOUT-165: OK, I am committing the Matrix module. Once done, I am going to move our Matrix stuff out of core and into the Matrix module. Then, Shashi, if you can update your patch, that would be great. From there, refactoring Vector to not have a Writable dependency (etc.) would be great, but let's handle that on a separate issue. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-18nov-updated.patch, mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781436#action_12781436 ] Grant Ingersoll commented on MAHOUT-165: OK, I moved over the matrix module, but there still needs to be some refactoring done there, as there are currently two layers of matrix packages. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-18nov-updated.patch, mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
Better integration of Mahout matrix capabilities with Colt Matrix additions --- Key: MAHOUT-204 URL: https://issues.apache.org/jira/browse/MAHOUT-204 Project: Mahout Issue Type: Improvement Affects Versions: 0.3 Reporter: Grant Ingersoll Fix For: 0.3 Per MAHOUT-165, we need to refactor the matrix package structures a bit to be more coherent and clean. For instance, there are two levels of matrix packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781438#action_12781438 ] Grant Ingersoll commented on MAHOUT-165: d'oh, missed the correct package names. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-18nov-updated.patch, mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781467#action_12781467 ] Grant Ingersoll commented on MAHOUT-165: OK, I committed Shashi's patch and fixed the colt package name remnants. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-18nov-updated.patch, mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781640#action_12781640 ] Grant Ingersoll commented on MAHOUT-204: Command is good, but patch would be useful to. Better integration of Mahout matrix capabilities with Colt Matrix additions --- Key: MAHOUT-204 URL: https://issues.apache.org/jira/browse/MAHOUT-204 Project: Mahout Issue Type: Improvement Affects Versions: 0.3 Reporter: Grant Ingersoll Fix For: 0.3 Per MAHOUT-165, we need to refactor the matrix package structures a bit to be more coherent and clean. For instance, there are two levels of matrix packages now, so those should be rectified. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781701#action_12781701 ] Grant Ingersoll commented on MAHOUT-206: Sorry, yes, I missed that and I agree we do need it. Separate and clearly label different SparseVector implementations - Key: MAHOUT-206 URL: https://issues.apache.org/jira/browse/MAHOUT-206 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Environment: all Reporter: Jake Mannix Fix For: 0.3 Shashi's last patch on MAHOUT-165 swapped out the int/double parallel array impl of SparseVector for an OpenIntDoubleMap (hash-based) one. We actually need both, as I think I've mentioned a gazillion times. There was a patch, long ago, on MAHOUT-165, in which Ted had OrderedIntDoubleVector, and OpenIntDoubleHashVector (or something to that effect), and neither of them are called SparseVector. I like this, because it forces people to choose what kind of SparseVector they want (and they should: sparse is an optimization, and the client should make a conscious decision what they're optimizing for). We could call them RandomAccessSparseVector and SequentialAccessSparseVector, to be really obvious. But really, the important part is we have both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-165: --- Attachment: MAHOUT-165-colt.patch The Colt stuff looks good, my only concern, legally, is the name, oddly enough. I don't think we should call it Colt. AFAICT, that name is owned by CERN and while the license allows us to bring over the code, it doesn't give us rights to the name. This patch changes the name to matrix, adds the appropriate legal bits to NOTICE.txt and LICENSE.txt This just covers the Colt stuff, it does not apply Shashi's patch. It seems like we should just move our Matrix (currently in core) out to this package and have core have a dependency on this module. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-18nov-updated.patch, mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-182) New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols()
[ https://issues.apache.org/jira/browse/MAHOUT-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781146#action_12781146 ] Grant Ingersoll commented on MAHOUT-182: reviewing this morning. New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols() --- Key: MAHOUT-182 URL: https://issues.apache.org/jira/browse/MAHOUT-182 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Jake Mannix Assignee: Grant Ingersoll Priority: Minor Attachments: MAHOUT-182.patch, matrixTimes.patch numRows() { return size()[ROW]; } and numCols() { return size()[COL]; } are pretty much no-brainer methods, right? Who wants to deal with a length-two array of ints all the time when getting the number of rows and columns of a matrix? Those are pretty trivial, but the key feature of a Matrix is to map Vector instances to Vector instances, and while you can do that currently by making a a row Matrix and doing Matrix.times(Matrix), it's silly to have to always do that. Matrix.times(Vector) is pretty needed. Even less trivial, for really big sparse Matrices, if you need to get (M'M)v for some matrix M, then this can be computed in one pass through M without ever computing the transpose of M by a simple reordering of the limits of summation. Attaching a patch with these implementations, including unit tests (as well as an improvement in the Matrix.times(Matrix) unit test to actually check the math). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-182) New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols()
[ https://issues.apache.org/jira/browse/MAHOUT-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-182. Resolution: Fixed Fix Version/s: 0.3 Committed revision 883094. New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols() --- Key: MAHOUT-182 URL: https://issues.apache.org/jira/browse/MAHOUT-182 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Jake Mannix Assignee: Grant Ingersoll Priority: Minor Fix For: 0.3 Attachments: MAHOUT-182.patch, matrixTimes.patch numRows() { return size()[ROW]; } and numCols() { return size()[COL]; } are pretty much no-brainer methods, right? Who wants to deal with a length-two array of ints all the time when getting the number of rows and columns of a matrix? Those are pretty trivial, but the key feature of a Matrix is to map Vector instances to Vector instances, and while you can do that currently by making a a row Matrix and doing Matrix.times(Matrix), it's silly to have to always do that. Matrix.times(Vector) is pretty needed. Even less trivial, for really big sparse Matrices, if you need to get (M'M)v for some matrix M, then this can be computed in one pass through M without ever computing the transpose of M by a simple reordering of the limits of summation. Attaching a patch with these implementations, including unit tests (as well as an improvement in the Matrix.times(Matrix) unit test to actually check the math). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch Starting to add unit tests. Still no support on the search/response side, but groundwork for adding multiple fields per SchemaField/FieldType is now laid. Still need a way to know that a field/fieldtype is going to output multiple fields so that we can detect them when searching, etc. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781145#action_12781145 ] Grant Ingersoll commented on SOLR-773: -- bq. Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome Grant would definitely welcome help! This is way too big for me. People wanting to help, should take a look at all of the linked items on this issue and see where they can contribute. If in doubt, please ask. I'm good at telling people what to do ;-) Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1567) Upgrade to Tika 0.5
[ https://issues.apache.org/jira/browse/SOLR-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-1567. --- Resolution: Fixed Committed revision 883095. Upgrade to Tika 0.5 --- Key: SOLR-1567 URL: https://issues.apache.org/jira/browse/SOLR-1567 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781159#action_12781159 ] Grant Ingersoll commented on SOLR-773: -- Not so much a re-arch, but an extension of some pieces to handle some new ideas. I think we all agree that Solr does a pretty good job of hiding some of the complexity of Lucene. So, by being able to simply declare a new field that is a CartesianTier field type, then the user need not worry at all about managing the tier prefix stuff that contrib/spatial requires. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781193#action_12781193 ] Grant Ingersoll commented on SOLR-1586: --- Sounds good, but how are you going to deal with the field types that need multiple fields (i.e. SOLR-1131)? We certainly could put up a GeohashField to get things started. Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781215#action_12781215 ] Grant Ingersoll commented on SOLR-1586: --- I'm not sure what good that does to put a lat/lon in a single String in georss:point format. What's your intent for searching/sorting/faceting? Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781262#action_12781262 ] Grant Ingersoll commented on SOLR-1586: --- I'd say wait until SOLR-1131 is done for everything other than the GeohashFieldType, as what you are proposing doesn't get you anything over just using StrField. By all means, put up a patch for GeohashFieldType when you have. We can commit that now. Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Summary: Allow a single field type to index multiple fields (was: Allow a single field to index multiple fields) Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780950#action_12780950 ] Grant Ingersoll commented on SOLR-1131: --- bq. Is this a good idea? Not sure yet. bq. Why don't we add a new interface MutlValuedFieldType which extends FieldType for this Aren't we just substituting a very simple construction for an instanceof check? I was possibly thinking of a couple of other options, too: 1. add a boolean on FT for isMultiField which returns false by default, then we could check that 2. Add a threadlocal that stores a preconstructed array of size one which could then simply be set for the single field case, which is the most common case. My gut, however, says the object is very short lived and is likely to be of negligible cost. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1585) Refactor shards handling out of QueryComponent and into ShardsComponent
Refactor shards handling out of QueryComponent and into ShardsComponent --- Key: SOLR-1585 URL: https://issues.apache.org/jira/browse/SOLR-1585 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 1.5 Per the TODOs in QueryComponent, create a ShardsComponent that handles setting up the shards. Additionally, make it so that it can handle smaller parameters, too. For instance, it is likely the case that in most setups only the IP address is changed, so we could have intelligent defaults which will make for shorter query strings. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1586) Create Spatial Point FieldTypes
Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780954#action_12780954 ] Grant Ingersoll commented on SOLR-1131: --- I'm also looking for ideas on how to handle the naming of the fields that are produced by this. I think a FieldType that produces multiple fields should hide the logistics of the naming, which this patch doesn't even begin to scratch the surface of and also on the search side, how does one search against just one of the fields? Would appreciated thoughts on that. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780954#action_12780954 ] Grant Ingersoll edited comment on SOLR-1131 at 11/21/09 12:11 PM: -- I'm also looking for ideas on how to handle the naming of the fields that are produced by this. I think a FieldType that produces multiple fields should hide the logistics of the naming, which this patch doesn't even begin to scratch the surface of and also on the search side, how does one search against just one of the fields? Would appreciate thoughts on that. was (Author: gsingers): I'm also looking for ideas on how to handle the naming of the fields that are produced by this. I think a FieldType that produces multiple fields should hide the logistics of the naming, which this patch doesn't even begin to scratch the surface of and also on the search side, how does one search against just one of the fields? Would appreciated thoughts on that. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780956#action_12780956 ] Grant Ingersoll commented on SOLR-1131: --- I definitely agree, Chris, the interesting part is how that manifests itself in terms of implementation, which is where I am digging in at the moment. It means the Query parsers need to handle it as well as the ResponseWriters, etc. Allow a single field type to index multiple fields -- Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780607#action_12780607 ] Grant Ingersoll commented on LUCENE-965: Hi Hui, I see you updated your paper on this, have you looked at how this might be implemented given the flexible indexing work under way? Implement a state-of-the-art retrieval function in Lucene - Key: LUCENE-965 URL: https://issues.apache.org/jira/browse/LUCENE-965 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Reporter: Hui Fang Fix For: 3.1 Attachments: axiomaticFunction.patch We implemented the axiomatic retrieval function, which is a state-of-the-art retrieval function, to replace the default similarity function in Lucene. We compared the performance of these two functions and reported the results at http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. The report shows that the performance of the axiomatic retrieval function is much better than the default function. The axiomatic retrieval function is able to find more relevant documents and users can see more relevant documents in the top-ranked documents. Incorporating such a state-of-the-art retrieval function could improve the search performance of all the applications which were built upon Lucene. Most changes related to the implementation are made in AXSimilarity, TermScorer and TermQuery.java. However, many test cases are hand coded to test whether the implementation of the default function is correct. Thus, I also made the modification to many test files to make the new retrieval function pass those cases. In fact, we found that some old test cases are not reasonable. For example, in the testQueries02 of TestBoolean2.java, the query is +w3 xx, and we have two documents w1 xx w2 yy w3 and w1 w3 xx w2 yy w3. The second document should be more relevant than the first one, because it has more occurrences of the query term w3. But the original test case would require us to rank the first document higher than the second one, which is not reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (SOLR-1581) Facet by Function
Facet by Function - Key: SOLR-1581 URL: https://issues.apache.org/jira/browse/SOLR-1581 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 1.5 It would be really great if we could execute a function and quantize it into buckets that could then be returned as facets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1131) Allow a single field to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-1131: - Assignee: Grant Ingersoll Allow a single field to index multiple fields - Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1582) DocumentBuilder does not properly handle binary field copy fields
DocumentBuilder does not properly handle binary field copy fields - Key: SOLR-1582 URL: https://issues.apache.org/jira/browse/SOLR-1582 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Trivial In DocumentBuilder, around lines 267, the BinaryField is created, but it is never assigned to the field that is added to the output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1582) DocumentBuilder does not properly handle binary field copy fields
[ https://issues.apache.org/jira/browse/SOLR-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-1582. --- Resolution: Fixed Fix Version/s: 1.5 committed. DocumentBuilder does not properly handle binary field copy fields - Key: SOLR-1582 URL: https://issues.apache.org/jira/browse/SOLR-1582 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Trivial Fix For: 1.5 In DocumentBuilder, around lines 267, the BinaryField is created, but it is never assigned to the field that is added to the output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1131) Allow a single field to index multiple fields
[ https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1131: -- Attachment: SOLR-1131.patch Brings it up to trunk. Still needs test cases. All other tests pass. Allow a single field to index multiple fields - Key: SOLR-1131 URL: https://issues.apache.org/jira/browse/SOLR-1131 Project: Solr Issue Type: New Feature Components: Analysis Reporter: Ryan McKinley Assignee: Grant Ingersoll Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch In a few special cases, it makes sense for a single field (the concept) to be indexed as a set of Fields (lucene Field). Consider SOLR-773. The concept point may be best indexed in a variety of ways: * geohash (sincle lucene field) * lat field, lon field (two double fields) * cartesian tiers (a series of fields with tokens to say if it exists within that region) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1574) simpler builtin functions
[ https://issues.apache.org/jira/browse/SOLR-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779955#action_12779955 ] Grant Ingersoll commented on SOLR-1574: --- +1. This was my first time writing functions. Overall, pretty easy to do, but in some cases I felt I was copying a lot of code, with the primary difference being the number of DocValues I needed to pass through. Not quite sure how to handle that in a more general way. simpler builtin functions - Key: SOLR-1574 URL: https://issues.apache.org/jira/browse/SOLR-1574 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Priority: Minor Attachments: SOLR-1574.patch Make it easier and less error prone to add simple functions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1578) Develop a Spatial Query Parser
Develop a Spatial Query Parser -- Key: SOLR-1578 URL: https://issues.apache.org/jira/browse/SOLR-1578 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 1.5 Given all the work around spatial, it would be beneficial if Solr had a query parser for dealing with spatial queries. For starters, something that used geonames data or maybe even Google Maps API would be really useful. Longer term, a spatial grammar that can robustly handle all the vagaries of addresses, etc. would be really cool. Refs: [1] http://www.geonames.org/export/client-libraries.html (note the Java client is ASL) [2] Data from geo names: http://download.geonames.org/export/dump/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENE-2081) CartesianShapeFilter improvements
CartesianShapeFilter improvements - Key: LUCENE-2081 URL: https://issues.apache.org/jira/browse/LUCENE-2081 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Affects Versions: 2.9 Reporter: Grant Ingersoll Priority: Minor The CartesiahShapeFilter could use some improvements. For starters, if we make sure the boxIds are sorted in index order, this should reduce the cost of seeks. I think we should also replace the logging with a similar approach to Lucene's output stream. We also can do Term creation a tad bit more efficiently too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779456#action_12779456 ] Grant Ingersoll commented on MAHOUT-165: All sounding pretty good. If you don't mind, I'd like to review the legal bits before committing. I have a deadline on Thursday, but can get to it after that. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-18nov-updated.patch, mahout-165-18nov.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1568) Implement Cartesian Tier Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1568: -- Attachment: CartesianTierQParserPlugin.java Here's the start of a QParserPlugin for this functionality. It's just the Java code and is not integrated with Solr yet. It works using the Lucene spatial stuff, but it should not be committed at this point, b/c I want to make it work with Tier based field types so that the end user need not even think about what the field name structure is (i.e. tier_). Can query with it as something like: {!tier x=32 y=-79 dist=20 prefix=tier_}. If you did want to use it, you would need to add it to your solrconfig.xml. Implement Cartesian Tier Filter --- Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: CartesianTierQParserPlugin.java Given an index with cartesian tiers, we should be able to pass in a filter query that takes in the field name, lat, lon and radius and produces an appropriate Filter for use by Solr. Note, contrib/spatial has such a filter, so it may just be that we need to hook in a QParserPlugin to handle it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1568) Implement Cartesian Tier Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779662#action_12779662 ] Grant Ingersoll commented on SOLR-1568: --- Honestly, it's a bit tricky and I'm not sure how best to resolve it (keep in mind that I'm not recommending the above piece be committed). There are a few pieces that can be improved in the CartesianShapeFilter to perform better (I still need to fix them in Lucene) but if I go that route, then I need to update the Lucene JARs associated with that in Solr. I can't really do that, b/c that likely means one of two things: 1. Patching the 2.9.1 branch and packaging up that JAR for Solr or wait for a 2.9.2 release from Lucene which isn't likely. 2. Patching trunk and including it. This would be a huge undertaking for Solr So, in the end, my decision was based on the fact that the code for it was pretty simple and wouldn't be a big deal to fix. Longer term, I will fix the issue in trunk of Lucene and then over time Solr can adapt to use that once we are on 3.x Implement Cartesian Tier Filter --- Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: CartesianTierQParserPlugin.java Given an index with cartesian tiers, we should be able to pass in a filter query that takes in the field name, lat, lon and radius and produces an appropriate Filter for use by Solr. Note, contrib/spatial has such a filter, so it may just be that we need to hook in a QParserPlugin to handle it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (LUCENE-2078) Remove dependencies on specific field names or prefixes for field names (i.e. tierPrefix)
Remove dependencies on specific field names or prefixes for field names (i.e. tierPrefix) - Key: LUCENE-2078 URL: https://issues.apache.org/jira/browse/LUCENE-2078 Project: Lucene - Java Issue Type: Improvement Components: contrib/spatial Affects Versions: 2.9 Reporter: Grant Ingersoll Fix For: 3.1 Currently, the spatial contrib makes a lot of assumptions about what field names are when these are simply not needed. By doing so, it prevents re-use in other applications that have setup their fields differently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778832#action_12778832 ] Grant Ingersoll commented on MAHOUT-165: Shashi, can you make sure the patch is up to date? Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-trove.patch, MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778831#action_12778831 ] Grant Ingersoll commented on MAHOUT-165: Yep, I think we are all agreed on Colt. I'll get 0.2 out and then we can add it. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-trove.patch, MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779111#action_12779111 ] Grant Ingersoll commented on MAHOUT-165: bq. So I found Wolfgang Hoschek, the author of Colt, and he confirms that it is no longer maintained, and wishes us the best of luck in taking it over for ourselves if we so desired. I seem to recall him being a Lucene contributor in the past. Perhaps he would be willing to donate Colt to Apache? I don't think we can just bring in it's source and claim it as ours. Another option is we see if he would move it over to Google Code and make some of us committers on the project. Perhaps Commons Math is interested in it, too. Using better primitives hash for sparse vector for performance gains Key: MAHOUT-165 URL: https://issues.apache.org/jira/browse/MAHOUT-165 Project: Mahout Issue Type: Improvement Components: Matrix Affects Versions: 0.2 Reporter: Shashikant Kore Assignee: Grant Ingersoll Fix For: 0.3 Attachments: colt.jar, mahout-165-trove.patch, MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch In SparseVector, we need primitives hash map for index and values. The present implementation of this hash map is not as efficient as some of the other implementations in non-Apache projects. In an experiment, I found that, for get/set operations, the primitive hash of Colt performance an order of magnitude better than OrderedIntDoubleMapping. For iteration it is 2x slower, though. Using Colt in Sparsevector improved performance of canopy generation. For an experimental dataset, the current implementation takes 50 minutes. Using Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the delay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1569) Allow literal Strings in functions
Allow literal Strings in functions -- Key: SOLR-1569 URL: https://issues.apache.org/jira/browse/SOLR-1569 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Some functions (for instance, those who take a geohash) may need to pass literal strings. This patch modifies the FunctionQParser to allow for quoted strings in functions (either single quote or double quote) to be passed through as a LiteralValueSource. It also adds the LiteralValueSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1569) Allow literal Strings in functions
[ https://issues.apache.org/jira/browse/SOLR-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778874#action_12778874 ] Grant Ingersoll commented on SOLR-1569: --- See http://www.lucidimagination.com/search/document/2ca306cbd392493f/passing_string_constants_to_functions Allow literal Strings in functions -- Key: SOLR-1569 URL: https://issues.apache.org/jira/browse/SOLR-1569 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Some functions (for instance, those who take a geohash) may need to pass literal strings. This patch modifies the FunctionQParser to allow for quoted strings in functions (either single quote or double quote) to be passed through as a LiteralValueSource. It also adds the LiteralValueSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1569) Allow literal Strings in functions
[ https://issues.apache.org/jira/browse/SOLR-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1569: -- Attachment: SOLR-1569.patch Allow literal Strings in functions -- Key: SOLR-1569 URL: https://issues.apache.org/jira/browse/SOLR-1569 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1569.patch Some functions (for instance, those who take a geohash) may need to pass literal strings. This patch modifies the FunctionQParser to allow for quoted strings in functions (either single quote or double quote) to be passed through as a LiteralValueSource. It also adds the LiteralValueSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1569) Allow literal Strings in functions
[ https://issues.apache.org/jira/browse/SOLR-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-1569. --- Resolution: Fixed Committed revision 881319. Allow literal Strings in functions -- Key: SOLR-1569 URL: https://issues.apache.org/jira/browse/SOLR-1569 Project: Solr Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1569.patch Some functions (for instance, those who take a geohash) may need to pass literal strings. This patch modifies the FunctionQParser to allow for quoted strings in functions (either single quote or double quote) to be passed through as a LiteralValueSource. It also adds the LiteralValueSource. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (SOLR-1568) Implement Cartesian Tier Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on SOLR-1568 started by Grant Ingersoll. Implement Cartesian Tier Filter --- Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Given an index with cartesian tiers, we should be able to pass in a filter query that takes in the field name, lat, lon and radius and produces an appropriate Filter for use by Solr. Note, contrib/spatial has such a filter, so it may just be that we need to hook in a QParserPlugin to handle it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things
[ https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-1302. --- Resolution: Fixed I'm going to close this, as I've implemented a bunch of them, with the exception of cosine. If someone wants to do that one, they can open a new issue. Fun with Distances - Add Distance functions for a variety of things --- Key: SOLR-1302 URL: https://issues.apache.org/jira/browse/SOLR-1302 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch There are many distance functions that are useful to have: 1. Great Circle (lat/lon) and other geo distances 2. Euclidean (Vector) 3. Manhattan (Vector) 4. Cosine (Vector) For the vector ones, the idea is that the fields on a document can be used to determine the vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things
[ https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778308#action_12778308 ] Grant Ingersoll commented on SOLR-1302: --- Still benchmarking. I think there are two other things that are needed for performance: 1. Filtering (see the frange stuff) 2. Caching of function results w/in a single search for use between filtering, scoring and sorting Fun with Distances - Add Distance functions for a variety of things --- Key: SOLR-1302 URL: https://issues.apache.org/jira/browse/SOLR-1302 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch There are many distance functions that are useful to have: 1. Great Circle (lat/lon) and other geo distances 2. Euclidean (Vector) 3. Manhattan (Vector) 4. Cosine (Vector) For the vector ones, the idea is that the fields on a document can be used to determine the vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things
[ https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778312#action_12778312 ] Grant Ingersoll commented on SOLR-1302: --- That being said, my anecdotal tests show them to be pretty fast over ~68K docs. I'm going to load up the entire Open Street Map planet at some point this week and then I can run real tests. Fun with Distances - Add Distance functions for a variety of things --- Key: SOLR-1302 URL: https://issues.apache.org/jira/browse/SOLR-1302 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch There are many distance functions that are useful to have: 1. Great Circle (lat/lon) and other geo distances 2. Euclidean (Vector) 3. Manhattan (Vector) 4. Cosine (Vector) For the vector ones, the idea is that the fields on a document can be used to determine the vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1567) Upgrade to Tika 0.5
Upgrade to Tika 0.5 --- Key: SOLR-1567 URL: https://issues.apache.org/jira/browse/SOLR-1567 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 As the title says. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1568) Implement Cartesian Tier Filter
Implement Cartesian Tier Filter --- Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Given an index with cartesian tiers, we should be able to pass in a filter query that takes in the field name, lat, lon and radius and produces an appropriate Filter for use by Solr. Note, contrib/spatial has such a filter, so it may just be that we need to hook in a QParserPlugin to handle it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.