[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-12-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789743#action_12789743
 ] 

Grant Ingersoll commented on SOLR-773:
--

Just an update:

# SOLR-1131: aka poly fields is almost ready to go.  Please review.
# SOLR-1297: sort by function query just needs review and then can be committed.

After that, we can add in the Cartesian Tier indexing and the Cartesian Tier 
QParserPlugin (after a little re-write).  Then we need pseudo-fields and we 
likely  want to hook in a per request function cache (maybe)

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
 solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1650) Consider being able to cache function results per request

2009-12-12 Thread Grant Ingersoll (JIRA)
Consider being able to cache function results per request
-

 Key: SOLR-1650
 URL: https://issues.apache.org/jira/browse/SOLR-1650
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 1.5


Once we can sort, filter and boost by functions, it may be the case that the 
same function is executed for the same value over and over again.  Consider 
ways to cache this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-12 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

Missing an  in DistanceUtils.parsePoint

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1650) Consider being able to cache function results per request

2009-12-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789804#action_12789804
 ] 

Grant Ingersoll commented on SOLR-1650:
---

I was thinking of a cache whose scope was the length of the request.  The basic 
use case is:

1. Filter by distance
2. Boost/Sort by distance
3. Facet by distance 

Of course, this could feed the pseudo fields, too.

 Consider being able to cache function results per request
 -

 Key: SOLR-1650
 URL: https://issues.apache.org/jira/browse/SOLR-1650
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 1.5


 Once we can sort, filter and boost by functions, it may be the case that the 
 same function is executed for the same value over and over again.  Consider 
 ways to cache this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1297) Enable sorting by Function Query

2009-12-12 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1297.
---

Resolution: Fixed

Committed revision 889997.

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1297.patch


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789240#action_12789240
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
Quick comment based on a spot check of the changes to IndexSchema: rather than 
make polyField special somehow w.r.t IndexSchema, and add a 
FieldType.getPolyFieldNames, etc, I had been thinking more along the lines of 
having an IndexSchema.registerDynamicFieldDefinition - just like the existing 
registerDynamicCopyField. This would (optionally) allow any field type to add 
other definitions to the IndexSchema. I continue to think it would be good to 
stay away of special logic for polyfields in the IndexSchema.
{quote}

So, then the FieldType would register it's Dynamic Fields in it's own init() 
method by calling this method?  That can work.

bq. Why use Dynamic Field array

The array is sorted and array access is much faster and we often have to loop 
over it to look it up.

{quote}
CoordinateFieldType: why process  1 sub field types and then throw an 
exception at the end? I cleaned this up to throw the Exception when it occurs.
{quote}

OK.  Actually, this should just be in the derived class, as it may be the case 
some other CoordinateFieldType has multiple sub types.

{quote}
# parsePoint in DistanceUtils, why use ',' as the separator - use ' ' (at least 
conforms to georss point then). I guess because you are supporting 
N-dimensional points, right?
# parsePoint - instead of complicated isolation loops, why not just use trim()? 
I've taken that approach in the patch I've attached.
{quote}

I think comma makes sense.  As for the optimization stuff, I agree w/ Yonik, 
this is code that will be called a lot.

* when checking for isDuplicateDynField, if it is, nothing is done. 
Shouldn't this be where an exception is thrown or a message is logged? In the 
patch I'm attaching I took the log approach.

{quote}

It is logged, but for the poly fields, if the dyn field is already defined, 
that's just fine.


 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789241#action_12789241
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I've got a patch almost ready that brings in the ValueSource stuff.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789339#action_12789339
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
I'm wondering what there is to agree with since optimization was never 
defined. Are you talking speed? Are you talking memory efficiency? Code 
readability? Maintainability? Some combination of all of those?
{quote}

Speed and memory.  

As for logging, that code is all going away in the next patch, I think

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789351#action_12789351
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. Unfortunately, it's at the cost of readability and maintainability. 

Maybe.  It took me all of 30 seconds to figure out what it was doing.  I'll put 
some comments on it.  While readability is important, Solr's goal is not to 
make a product that a CS101 grad can read, it's too build a blazing fast search 
server.  That call could hit millions of times when indexing points.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

OK, this is getting a lot closer to ready to commit.

Changes:
# Introduced a MultiValueSource - ValueSource that abstractly represents 
ValueSources for poly fields, and other things.  
# Introduced PointValueSource - point(x,y,z) - a MultiValueSource that wraps 
other value sources (could be called something else, I suppose)
# Implemented PointTypeValueSource to represent ValueSource for the PointType 
class. 
# Hooked in multivalue callbacks to DocValues.  In addition to making functions 
work with Points (et. al) it should be possible to write functions that work on 
multivalued fields, but I did not undertake this work.
# Add in SchemaAware callback mechanism so that Field Types and other schema 
stuff can register dynamic fields, etc. after the schema has been created
# Updated the example to have spatial information in the docs, etc.  See 
http://wiki.apache.org/solr/SpatialSearch
# Modified the distance functions to work with MultiValueSources
# cleaned up the tests
# Incorporated various comments from Chris and Yonik.


 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789526#action_12789526
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I think this is ready to commit.  I'd like to do so on Monday or Tuesday of 
next week, so that should give plenty of time for further review

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, 
 SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1297) Enable sorting by Function Query

2009-12-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1297:
-

Assignee: Grant Ingersoll

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (SOLR-1297) Enable sorting by Function Query

2009-12-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on SOLR-1297 started by Grant Ingersoll.

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1297) Enable sorting by Function Query

2009-12-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789551#action_12789551
 ] 

Grant Ingersoll commented on SOLR-1297:
---

For this, I think we want to be able to do things like:

Just functions
{code}
sort=dist(2,x,y, point(0,0)) desc
{code}

Multiple sort params, some functions, some fields
{code}
sort=weight asc,dist(2,x,y, point(0,0)) asc
{code}

If and when a function result cache exists, we should be able to take advantage 
of that too, but that is an implementation detail.

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

2009-12-10 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788743#action_12788743
 ] 

Grant Ingersoll commented on LUCENE-1377:
-

bq. Really? we don't add any analysis capabilities to lucene that solr uses too?

Yes, but Solr has a dependency on Lucene, not the other way around.  Solr is, 
by definition, at a higher level than Lucene.  In order for Lucene to use 
something of Solr's, it has to, essentially, fork the code.  It's happened 
several times where stuff was pulled out of Solr and put in Lucene, but then 
Solr wasn't updated to use it or it was updated due to Solr undertaking a fair 
amount of work to then use the exact same feature it had in it's own code base 
that Lucene then added.  Since Solr has the dep. on Lucene, it's natural Solr 
takes advantage of what Lucene has to offer, just like any other project that 
uses Lucene.

Like I said, though, it may make sense for analysis to be separate.  I was just 
pointing out it is a slippery slope.

 Add HTMLStripReader and WordDelimiterFilter from SOLR
 -

 Key: LUCENE-1377
 URL: https://issues.apache.org/jira/browse/LUCENE-1377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.3.2
Reporter: Jason Rutherglen
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
 useful for a wide variety of use cases.  It would be good to place them into 
 core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

2009-12-10 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788745#action_12788745
 ] 

Grant Ingersoll commented on LUCENE-1377:
-

bq. i think we want to remove, not create duplication.

I think we all agree on that.  Alas, though, the devil is in the details and 
that's where it always seems to get hung up.  Not saying it can't work (I've 
often been an advocate for it), just saying we've gone around on this a number 
of times and I think it gets hung up on the fact that the two communities are 
fairly independent with the exception of a few core committers.

 Add HTMLStripReader and WordDelimiterFilter from SOLR
 -

 Key: LUCENE-1377
 URL: https://issues.apache.org/jira/browse/LUCENE-1377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.3.2
Reporter: Jason Rutherglen
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
 useful for a wide variety of use cases.  It would be good to place them into 
 core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

2009-12-10 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788758#action_12788758
 ] 

Grant Ingersoll commented on LUCENE-1377:
-

bq. one thing we could do, is to just have a general rule that we should not 
copy stuff like this and instead it should be moved, with tests and back compat 
and all that working on both projects.

Yeah, and we can probably take this on a more case by case basis, but it still 
is creating extra work for Solr committers w/ very little benefit to the 
project.  Not a big deal for the analyzers stuff, since Solr has that process 
mostly automated anyway, but may be a bigger issue for other stuff.

So, if we go with Mike's proposal and make Lucene core have a dep on a new 
Analyzers module, then this could work, but even that I'm not sure about, as 
Solr is not on Lucene 3.x yet (and doesn't have immediate plans for it either). 
 At any rate, let's get concrete w/ a patch.

 Add HTMLStripReader and WordDelimiterFilter from SOLR
 -

 Key: LUCENE-1377
 URL: https://issues.apache.org/jira/browse/LUCENE-1377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.3.2
Reporter: Jason Rutherglen
Priority: Minor
   Original Estimate: 24h
  Remaining Estimate: 24h

 SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
 useful for a wide variety of use cases.  It would be good to place them into 
 core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-10 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788751#action_12788751
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
Seems like SolrQueryParser should use getFieldQuery for everything (except 
TextField... but it could even be used for that if we make it such that we 
could call back to getBooleanQuery, etc). I had this in my patch.
{quote}

Yonik, could you elaborate on this?  It seems kind of weird to have that 
instanceof check in SolrQueryParser.getFieldQuery() to see if we have a 
TextField or not.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-10 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

This implements Option B as laid out at: 
http://search.lucidimagination.com/search/document/83a5442ab155686/solr_1131_multiple_fields_per_field_type#a600de441418a798

Next up:  Implement ValueSource support for PointType.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-10 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788805#action_12788805
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. DocValues.getPoint(double[] point)

OK, let me see how that plays out.

See also 
http://www.lucidimagination.com/search/document/fd804bcd78d7bec1/solr_1131_poly_fields_and_valuesource

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788121#action_12788121
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. IMO, unit tests can be too low level. They can also be too fragile. 

I guess it all comes down to what you call a unit.  

bq. t would be nice, for example, if testPointFieldType indexed a few couments 
(with various combinations of stored / indexed) and then queried the index,

This is done in testIndexing()

bq. TopDocs topDocs = core.getSearcher().get().search(bq, 1);

Yeah, see my comment there even!  I wanted a way to validate that the correct 
query is created, but I don't even really need to run a search for that.

{quote}
Rendundant null checks, trivial strings, etc:
+ assertNotNull(topDocs is null, topDocs);
+ assertTrue(topDocs.totalHits +  does not equal:  + 1, topDocs.totalHits == 
1)
{quote}

I need to update my IntelliJ Live Templates, as I have them setup to spit out 
a pattern like above

bq. Please see the DocumentBuilder changes I had added... 

Will do.

{quote}
Seems like SolrQueryParser should use getFieldQuery for everything (except 
TextField... but it could even be used for that if we make it such that we 
could call back to getBooleanQuery, etc). I had this in my patch.
{quote}

I thought I captured that, but will look again.


 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788123#action_12788123
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
Regarding polyfields... it's not clear why they are special enough to have to 
change the IndexSchema? (IndexSchema.isPolyField, getPolyField, 
getPolyFieldType, getPolyFieldTypeNoEx, etc). Can't we just store them as 
normal field types?
{quote}

My thinking was that a Query Parser or other things might need to know look up 
this information, but you are right, I don't have a specific use case for them 
at the moment.  At the same time, poly fields _feel_ like a hybrid between 
regular fields and dynamic fields and thus fit at the same level they do.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788151#action_12788151
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. What if, instead, dynamic fields are directly used for subfields?

That then requires those dynamic fields to be present, which I'd rather not 
have to do.  Part of the goal of this issue is to hide the implementation.  
Having said that, I still don't know whether that means I need to keep the 
IndexSchema changes.  Let me do another iteration.

bq. Another thing to keep in mind - not all subfields will always be of the 
same type.

Agreed, but I don't think this is baked in to the generic capabilities, just 
the Point stuff, where I think it is fine to have the same sub-type.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788163#action_12788163
 ] 

Grant Ingersoll commented on SOLR-1131:
---

{quote}
Also, you have subFieldType=double in the schema... and that requires that 
the double field type be defined. Why not have subFieldSuffix=_d and 
require the _d dynamic field be defined? Seems like the same complexity level
{quote}

I think it makes more sense for the subFieldType to be present to be tied to a 
type than a Field (subFieldSuffix), as it seems weird to have a field type have 
a dependency on a Field, whereas it seems fine for a field type to have a 
dependency on another field type.

{quote}
For a specific point implementation, that's fine. But if you use a point type 
that can do cartesian grid stuff, then you already have different field types. 
But I guess subFieldType=double need only apply to some of the subfields (the 
ones that index the points).
{quote}
I'm not sure I see this.  If and when we implement CartesianPointType, it will 
still need to have a type for the sub fields (depending on the tiers specified) 
but I don't see why the subFieldType wouldn't be the same for all of them.  
AIUI, they all have the same precision requirements.

I think part of what's missing is that for some of these attributes, it would 
be better for them to be field properties and not fieldType properties.  For 
instance for the Cartesian case, you will need to declare what levels to 
support.  If that is specified on the FieldType, then you have a proliferation 
of Field Type declarations, whereas if it is on the Field, that is a lot 
cleaner and less verbose.  I'm just not sure how that gets implemented just 
yet, as having to specify startTier and endTier doesn't seem like the same 
level as multiValued or stored.  

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788170#action_12788170
 ] 

Grant Ingersoll edited comment on SOLR-1131 at 12/9/09 5:18 PM:


{quote}
fieldType name=xy class=solr.PointType dimension=2 
subFieldType=double/
field name=home type=xy indexed=true stored=true/
{quote}
Two indexed fields
home___0
home___1

One stored field:
home

  was (Author: gsingers):
{quote}
fieldType name=xy class=solr.PointType dimension=2 
subFieldType=double/
field name=home type=xy indexed=true stored=true/
{quote}

home___0
home___1
  
 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788298#action_12788298
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. OK... so the real issue is that this introduces a new mechanism to look up 
field types... not necessarily a horrible thing, but we should definitely think 
twice before doing so. 

Agreed.  I'm not wedded to this approach, just want to see the discussion 
through.  I do feel strongly that the goal is such that an app designer should 
be able to use a FieldType just as they always have, either dynamic or static.  
How we get to that I don't care so much as long as it works and performs.

bq. But... that scheme seems to limit us to a single subField type (in addition 
to the other downsides of requiring a new lookup mechanism).

I don't follow this.  In this particular implementation, I have a single 
subFieldType, but I don't see why a different implementation couldn't do 
something like:
{code}
fieldType name=foo type=solr.MultiSubPointType dimension=3 
subFieldTypes=double,tdouble,int/
{code}

bq. Aside: it looks like the code for getFieldOrNull isn't right? Seems like it 
will return a field with both the wrong type and the wrong name?

Hmmm, I _think_ it should return the owning Schema Field, i.e. the one that 
exists in the schema.xml file.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-09 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788307#action_12788307
 ] 

Grant Ingersoll commented on SOLR-1131:
---

Note, I don't think the distance function queries will work w/ my patch yet.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCENE-2127) Improved large result handling

2009-12-07 Thread Grant Ingersoll (JIRA)
Improved large result handling
--

 Key: LUCENE-2127
 URL: https://issues.apache.org/jira/browse/LUCENE-2127
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Grant Ingersoll
Priority: Minor


Per 
http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
 it would be nice to offer some other Collectors that are better at handling 
really large number of results.  This could be implemented in a variety of ways 
via Collectors.  For instance, we could have a raw collector that does no 
sorting and just returns the ScoreDocs, or we could do as Mike suggests and 
have Collectors that have heuristics about memory tradeoffs and only heapify 
when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-07 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

OK, here's my take on this.  I took Yonik's and merged it w/ a patch I had in 
the works.  It's not done, but all tests pass, including the new on I added 
(PolyFieldTest).  Yonik's move to put getFieldQuery in FieldType was just the 
key to answering the question of how to generate queries given a FieldType.

Notes:
1. I changed the Geo examples to be CoordinateFieldType (representing an 
abstract coordinate system) and then PointFieldType which represents a point in 
an n-dimensional space (default 2D).  I think from this, we could easily add 
things like PolygonFieldType, etc. which would allow us to create more 
sophisticated shapes and do things like intersections, etc.  For instance, 
imagine saying:  Does this point lie within this shape?  I think that might be 
able to be expressed as a RangeQuery
2. I'm not sure I care for the name of the new abstract FieldType that is a 
base class of CoordinateFieldType called DelegatingFieldType
3. I'm not sure yet on the properties of the generated fields just yet.  Right 
now, I'm delegating the handling to the sub FieldType except I'm overriding to 
turn off storage, which I think is pretty cool (could even work as a copy field 
like functionality)
4. I'm not thrilled about creating a SchemaField every time in the createFields 
protected helper method, but SchemaField is final and doesn't have a setName 
method (which makes sense)

Questions for Yonik on his patch:
1. Why is TextField overriding getFieldQuery when it isn't called, except 
possibly via the FieldQParserPlugin?
2. I'm not sure I understand the getDistance, getBoundingBox methods on the 
GeoFieldType.   It seems like that precludes one from picking a specific 
distance (for instance, some times you may want a faster approx. and others a 
slower)


Needs:
1. Write up changes.txt
2. More tests, including performance testing
3. Patch doesn't support dynamic fields yet, but it should

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-12-07 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787113#action_12787113
 ] 

Grant Ingersoll commented on SOLR-1586:
---

FYI, see the SOLR-1131 for an implementation of a Point Field Type.

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: examplegeopointdoc.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-12-07 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787115#action_12787115
 ] 

Grant Ingersoll commented on SOLR-1586:
---

bq. we should have the ability to output those fields as georss per ryan's 
suggestion

Ryan can correct me if I am putting words in his mouth, but I don't think he 
literally meant we needed to use those exact tags.  I think he just meant the 
format of the actual values.

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: examplegeopointdoc.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-12-07 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787198#action_12787198
 ] 

Grant Ingersoll commented on SOLR-1586:
---

Can you put a patch containing just the geohash stuff?

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: examplegeopointdoc.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-12-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786586#action_12786586
 ] 

Grant Ingersoll commented on SOLR-1131:
---

Hey Yonik,

One of the things I was debating was whether it was worthwhile to keep the 
single field creation or not.  I see in your patch you drop it.  I've got a 
patch that keeps it.  I will try to put it up this week.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786164#action_12786164
 ] 

Grant Ingersoll commented on LUCENE-2091:
-

I haven't looked at the patch yet, but...

Should we take just a small step back and consider what it would take to 
actually make scoring more pluggable instead of just thinking about how best to 
integrate BM25?  In other words, someone else has also donated an 
implementation of the Axiomatic Retr. Function.  Much like BM25, I think it 
also requires avg. doc length, as does (I believe) language modeling and some 
other approaches.   Of course, we need to do this in a way that doesn't hurt 
performance for the default case.   

I'm also curious if anyone has compared BM25 w/ a Lucene similarity that uses a 
different length normalization factor?  I've seen many people use a different 
len. norm with good success, but it isn't necessarily for everyone.

 Add BM25 Scoring to Lucene
 --

 Key: LUCENE-2091
 URL: https://issues.apache.org/jira/browse/LUCENE-2091
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Reporter: Yuval Feinstein
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2091.patch, persianlucene.jpg

   Original Estimate: 48h
  Remaining Estimate: 48h

 http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
 Okapi-BM25 scoring in the Lucene framework,
 as an alternative to the standard Lucene scoring (which is a version of mixed 
 boolean/TFIDF).
 I have refactored this a bit, added unit tests and improved the runtime 
 somewhat.
 I would like to contribute the code to Lucene under contrib. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786220#action_12786220
 ] 

Grant Ingersoll commented on LUCENE-2091:
-

bq. Yes in the image posted here, I tried modifying length normalization with 
SweetSpot etc as others have done in the past. For this corpus I was unable to 
improve it in this way.

Yeah, can't speak for SweetSpot, but there are other approaches too that don't 
favor shorter docs all the time.

 Add BM25 Scoring to Lucene
 --

 Key: LUCENE-2091
 URL: https://issues.apache.org/jira/browse/LUCENE-2091
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Reporter: Yuval Feinstein
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2091.patch, persianlucene.jpg

   Original Estimate: 48h
  Remaining Estimate: 48h

 http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
 Okapi-BM25 scoring in the Lucene framework,
 as an alternative to the standard Lucene scoring (which is a version of mixed 
 boolean/TFIDF).
 I have refactored this a bit, added unit tests and improved the runtime 
 somewhat.
 I would like to contribute the code to Lucene under contrib. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (SOLR-1622) Add aggregate Math capabilities to Solr above and beyond the StatsComponent

2009-12-04 Thread Grant Ingersoll (JIRA)
Add aggregate Math capabilities to Solr above and beyond the StatsComponent
---

 Key: SOLR-1622
 URL: https://issues.apache.org/jira/browse/SOLR-1622
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Grant Ingersoll
Priority: Minor


It would be really cool if we could have a QueryComponent that enabled doing 
aggregating calculations on search results similar to what the StatsComponent 
does, but in a more generic way.

I also think it makes sense to reuse some of the function query capabilities 
(like the parser, etc.).

I imagine the interface might look like:
{code}
math=truefunc=recip(sum(A))
{code}

This would calculate the reciprocal of the sum of the values in the field A.  
Then, you could do go across fields, too
{code}
math=truefunc=recip(sum(A, B, C))
{code}
Which would  sum the values across fields A, B and C.

It is important to make the functions pluggable and reusable.  Might be also 
nice to see if we can share the core calculations between function queries and 
this capability such that if someone adds a new aggregating function, it can 
also be used as a new Function query.
Of course, we'd want plugin functions, too, so that people can plugin their own 
functions.  After this is implemented, I think StatsComponent becomes a 
derivative of the new MathComponent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784705#action_12784705
 ] 

Grant Ingersoll commented on SOLR-1277:
---

Mark,

I think this makes sense.  I think you can grab my ZK admin ReqHandler and the 
shards refactoring, too and pull that into this patch, as most of them are 
independent of the actual startup/config part.  If you don't get to it, I will 
try to next week.

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)

2009-12-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784948#action_12784948
 ] 

Grant Ingersoll commented on SOLR-1277:
---

bq. Is there a benefit to refactoring the shards piece to a component rather 
than a simple helper class or something?

Yes.  For starters, not all requests require the QueryComponent, but still may 
require distributed (TermsComponent) caps.  Second, I think it is cleaner and 
allows others to plugin/override with their own capabilities.

 Implement a Solr specific naming service (using Zookeeper)
 --

 Key: SOLR-1277
 URL: https://issues.apache.org/jira/browse/SOLR-1277
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, 
 SOLR-1277.patch, zookeeper-3.2.1.jar

   Original Estimate: 672h
  Remaining Estimate: 672h

 The goal is to give Solr server clusters self-healing attributes
 where if a server fails, indexing and searching don't stop and
 all of the partitions remain searchable. For configuration, the
 ability to centrally deploy a new configuration without servers
 going offline.
 We can start with basic failover and start from there?
 Features:
 * Automatic failover (i.e. when a server fails, clients stop
 trying to index to or search it)
 * Centralized configuration management (i.e. new solrconfig.xml
 or schema.xml propagates to a live Solr cluster)
 * Optionally allow shards of a partition to be moved to another
 server (i.e. if a server gets hot, move the hot segments out to
 cooler servers). Ideally we'd have a way to detect hot segments
 and move them seamlessly. With NRT this becomes somewhat more
 difficult but not impossible?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions

2009-11-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782898#action_12782898
 ] 

Grant Ingersoll commented on MAHOUT-204:


+1 on aggressive pruning and cleanup.  Feel free to commit as you go, too.  No 
need to get it all done in one fell swoop.

 Better integration of Mahout matrix capabilities with Colt Matrix additions
 ---

 Key: MAHOUT-204
 URL: https://issues.apache.org/jira/browse/MAHOUT-204
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.3
Reporter: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-204-author-cleanup.patch


 Per MAHOUT-165, we need to refactor the matrix package structures a bit to be 
 more coherent and clean.  For instance, there are two levels of matrix 
 packages now, so those should be rectified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781896#action_12781896
 ] 

Grant Ingersoll commented on MAHOUT-204:


Yeah, go ahead and submit the patch, then do the formatting.

 Better integration of Mahout matrix capabilities with Colt Matrix additions
 ---

 Key: MAHOUT-204
 URL: https://issues.apache.org/jira/browse/MAHOUT-204
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.3
Reporter: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-204-author-cleanup.patch


 Per MAHOUT-165, we need to refactor the matrix package structures a bit to be 
 more coherent and clean.  For instance, there are two levels of matrix 
 packages now, so those should be rectified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned MAHOUT-207:
--

Assignee: Grant Ingersoll

 AbstractVector.hashCode() should not care about the order of iteration over 
 elements
 

 Key: MAHOUT-207
 URL: https://issues.apache.org/jira/browse/MAHOUT-207
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-207.patch


 As was discussed in MAHOUT-165, hashCode can be implemented simply like this:
 {code} 
 public int hashCode() {
 final int prime = 31;
 int result = prime + ((name == null) ? 0 : name.hashCode());
 result = prime * result + size();
 IteratorElement iter = iterateNonZero();
 while (iter.hasNext()) {
   Element ele = iter.next();
   long v = Double.doubleToLongBits(ele.get());
   result += (ele.index() * (int)(v^(v32)));
 }
 return result;
   }
 {code}
 which obviates the need to sort the elements in the case of a random access 
 hash-based implementation.  Also, (ele.index() * (int)(v^(v32)) ) == 0 when 
 v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse 
 vectors which have zero elements returned from the iterateNonZero() iterator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782041#action_12782041
 ] 

Grant Ingersoll commented on MAHOUT-207:


How does this all relate to https://issues.apache.org/jira/browse/MAHOUT-159?



 AbstractVector.hashCode() should not care about the order of iteration over 
 elements
 

 Key: MAHOUT-207
 URL: https://issues.apache.org/jira/browse/MAHOUT-207
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
 Fix For: 0.3

 Attachments: MAHOUT-207.patch


 As was discussed in MAHOUT-165, hashCode can be implemented simply like this:
 {code} 
 public int hashCode() {
 final int prime = 31;
 int result = prime + ((name == null) ? 0 : name.hashCode());
 result = prime * result + size();
 IteratorElement iter = iterateNonZero();
 while (iter.hasNext()) {
   Element ele = iter.next();
   long v = Double.doubleToLongBits(ele.get());
   result += (ele.index() * (int)(v^(v32)));
 }
 return result;
   }
 {code}
 which obviates the need to sort the elements in the case of a random access 
 hash-based implementation.  Also, (ele.index() * (int)(v^(v32)) ) == 0 when 
 v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse 
 vectors which have zero elements returned from the iterateNonZero() iterator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned MAHOUT-206:
--

Assignee: Grant Ingersoll

 Separate and clearly label different SparseVector implementations
 -

 Key: MAHOUT-206
 URL: https://issues.apache.org/jira/browse/MAHOUT-206
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-206.patch


 Shashi's last patch on MAHOUT-165 swapped out the int/double parallel array 
 impl of SparseVector for an OpenIntDoubleMap (hash-based) one.  We actually 
 need both, as I think I've mentioned a gazillion times.
 There was a patch, long ago, on MAHOUT-165, in which Ted had 
 OrderedIntDoubleVector, and OpenIntDoubleHashVector (or something to that 
 effect), and neither of them are called SparseVector.  I like this, because 
 it forces people to choose what kind of SparseVector they want (and they 
 should: sparse is an optimization, and the client should make a conscious 
 decision what they're optimizing for).  
 We could call them RandomAccessSparseVector and SequentialAccessSparseVector, 
 to be really obvious.
 But really, the important part is we have both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782054#action_12782054
 ] 

Grant Ingersoll commented on MAHOUT-206:


Jake, there's something weird in this patch in regards to SparseVector.  It 
didn't delete the file, but instead left it empty.

It seems like there is still some commonality between the two implementations 
(size, cardinality, etc.) that I think it would be worthwhile to keep 
SparseVector as an abstract class which the other two extend.

 Separate and clearly label different SparseVector implementations
 -

 Key: MAHOUT-206
 URL: https://issues.apache.org/jira/browse/MAHOUT-206
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-206.patch


 Shashi's last patch on MAHOUT-165 swapped out the int/double parallel array 
 impl of SparseVector for an OpenIntDoubleMap (hash-based) one.  We actually 
 need both, as I think I've mentioned a gazillion times.
 There was a patch, long ago, on MAHOUT-165, in which Ted had 
 OrderedIntDoubleVector, and OpenIntDoubleHashVector (or something to that 
 effect), and neither of them are called SparseVector.  I like this, because 
 it forces people to choose what kind of SparseVector they want (and they 
 should: sparse is an optimization, and the client should make a conscious 
 decision what they're optimizing for).  
 We could call them RandomAccessSparseVector and SequentialAccessSparseVector, 
 to be really obvious.
 But really, the important part is we have both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782064#action_12782064
 ] 

Grant Ingersoll commented on MAHOUT-207:


Aren't we loosing some of the benefits of SparseVector with this explicit set 
to zero stuff (by having to call equivalent)?  I've wondered in the past how a 
Sparse implementation should handle something like setQuick(i, 0).  One 
approach is to set it, but the other is to ignore it and possibly remove any 
previous nonzero entry, right?  Seems like tradeoffs w/ both.

 AbstractVector.hashCode() should not care about the order of iteration over 
 elements
 

 Key: MAHOUT-207
 URL: https://issues.apache.org/jira/browse/MAHOUT-207
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-207.patch


 As was discussed in MAHOUT-165, hashCode can be implemented simply like this:
 {code} 
 public int hashCode() {
 final int prime = 31;
 int result = prime + ((name == null) ? 0 : name.hashCode());
 result = prime * result + size();
 IteratorElement iter = iterateNonZero();
 while (iter.hasNext()) {
   Element ele = iter.next();
   long v = Double.doubleToLongBits(ele.get());
   result += (ele.index() * (int)(v^(v32)));
 }
 return result;
   }
 {code}
 which obviates the need to sort the elements in the case of a random access 
 hash-based implementation.  Also, (ele.index() * (int)(v^(v32)) ) == 0 when 
 v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse 
 vectors which have zero elements returned from the iterateNonZero() iterator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782080#action_12782080
 ] 

Grant Ingersoll commented on MAHOUT-207:


All makes sense.  Per the refactoring in MAHOUT-206, I think this argues even 
more for an abstract SparseVector implementation that can handle some of the 
common code.

 AbstractVector.hashCode() should not care about the order of iteration over 
 elements
 

 Key: MAHOUT-207
 URL: https://issues.apache.org/jira/browse/MAHOUT-207
 Project: Mahout
  Issue Type: Bug
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-207.patch


 As was discussed in MAHOUT-165, hashCode can be implemented simply like this:
 {code} 
 public int hashCode() {
 final int prime = 31;
 int result = prime + ((name == null) ? 0 : name.hashCode());
 result = prime * result + size();
 IteratorElement iter = iterateNonZero();
 while (iter.hasNext()) {
   Element ele = iter.next();
   long v = Double.doubleToLongBits(ele.get());
   result += (ele.index() * (int)(v^(v32)));
 }
 return result;
   }
 {code}
 which obviates the need to sort the elements in the case of a random access 
 hash-based implementation.  Also, (ele.index() * (int)(v^(v32)) ) == 0 when 
 v = Double.doubleToLongBits(0d), which avoids the wrong hashCode() for sparse 
 vectors which have zero elements returned from the iterateNonZero() iterator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-24 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Fix Version/s: 1.5

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781945#action_12781945
 ] 

Grant Ingersoll commented on SOLR-1586:
---

Hey Chris,

I'm not sure we want to bring in the actual namespace for georss.  That seems 
like overkill, but I'm open to hear what others think.

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: examplegeopointdoc.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781948#action_12781948
 ] 

Grant Ingersoll commented on SOLR-1131:
---

See discussion at 
http://search.lucidimagination.com/search/document/d24c920ddf05b4f7/solr_1131_multiple_fields_per_field_type

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-11-24 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781973#action_12781973
 ] 

Grant Ingersoll commented on SOLR-1586:
---

Also, where does this patch actually encode the Geohash value?   The Lucene 
spatial contrib JAR has GeoHashUtils for just this.  See the GeohashFunction 
for usage. 

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: examplegeopointdoc.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781427#action_12781427
 ] 

Grant Ingersoll commented on MAHOUT-165:


OK, I am committing the Matrix module.  Once done, I am going to move our 
Matrix stuff out of core and into the Matrix module.  Then, Shashi, if you can 
update your patch, that would be great.  From there, refactoring Vector to not 
have a Writable dependency (etc.) would be great, but let's handle that on a 
separate issue.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-18nov-updated.patch, 
 mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
 MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
 mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781436#action_12781436
 ] 

Grant Ingersoll commented on MAHOUT-165:


OK, I moved over the matrix module, but there still needs to be some 
refactoring done there, as there are currently two layers of matrix packages.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-18nov-updated.patch, 
 mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
 MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
 mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions

2009-11-23 Thread Grant Ingersoll (JIRA)
Better integration of Mahout matrix capabilities with Colt Matrix additions
---

 Key: MAHOUT-204
 URL: https://issues.apache.org/jira/browse/MAHOUT-204
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.3
Reporter: Grant Ingersoll
 Fix For: 0.3


Per MAHOUT-165, we need to refactor the matrix package structures a bit to be 
more coherent and clean.  For instance, there are two levels of matrix packages 
now, so those should be rectified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781438#action_12781438
 ] 

Grant Ingersoll commented on MAHOUT-165:


d'oh, missed the correct package names.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-18nov-updated.patch, 
 mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
 MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
 mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781467#action_12781467
 ] 

Grant Ingersoll commented on MAHOUT-165:


OK, I committed Shashi's patch and fixed the colt package name remnants.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-18nov-updated.patch, 
 mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
 MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
 mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions

2009-11-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781640#action_12781640
 ] 

Grant Ingersoll commented on MAHOUT-204:


Command is good, but patch would be useful to.

 Better integration of Mahout matrix capabilities with Colt Matrix additions
 ---

 Key: MAHOUT-204
 URL: https://issues.apache.org/jira/browse/MAHOUT-204
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.3
Reporter: Grant Ingersoll
 Fix For: 0.3


 Per MAHOUT-165, we need to refactor the matrix package structures a bit to be 
 more coherent and clean.  For instance, there are two levels of matrix 
 packages now, so those should be rectified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-23 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781701#action_12781701
 ] 

Grant Ingersoll commented on MAHOUT-206:


Sorry, yes, I missed that and I agree we do need it.

 Separate and clearly label different SparseVector implementations
 -

 Key: MAHOUT-206
 URL: https://issues.apache.org/jira/browse/MAHOUT-206
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
 Environment: all
Reporter: Jake Mannix
 Fix For: 0.3


 Shashi's last patch on MAHOUT-165 swapped out the int/double parallel array 
 impl of SparseVector for an OpenIntDoubleMap (hash-based) one.  We actually 
 need both, as I think I've mentioned a gazillion times.
 There was a patch, long ago, on MAHOUT-165, in which Ted had 
 OrderedIntDoubleVector, and OpenIntDoubleHashVector (or something to that 
 effect), and neither of them are called SparseVector.  I like this, because 
 it forces people to choose what kind of SparseVector they want (and they 
 should: sparse is an optimization, and the client should make a conscious 
 decision what they're optimizing for).  
 We could call them RandomAccessSparseVector and SequentialAccessSparseVector, 
 to be really obvious.
 But really, the important part is we have both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-22 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-165:
---

Attachment: MAHOUT-165-colt.patch

The Colt stuff looks good, my only concern, legally, is the name, oddly enough. 
 I don't think we should call it Colt.  AFAICT, that name is owned by CERN and 
while the license allows us to bring over the code, it doesn't give us rights 
to the name.

This patch changes the name to matrix, adds the appropriate legal bits to 
NOTICE.txt and LICENSE.txt

This just covers the Colt stuff, it does not apply Shashi's patch.  

It seems like we should just move our Matrix (currently in core) out to this 
package and have core have a dependency on this module.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-18nov-updated.patch, 
 mahout-165-18nov.patch, MAHOUT-165-colt.patch, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, MAHOUT-165-with-colt-module.patch, 
 MAHOUT-165-with-colt.patch, mahout-165.patch, MAHOUT-165.patch, 
 mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-182) New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols()

2009-11-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781146#action_12781146
 ] 

Grant Ingersoll commented on MAHOUT-182:


reviewing this morning.

 New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() 
 and numCols()
 ---

 Key: MAHOUT-182
 URL: https://issues.apache.org/jira/browse/MAHOUT-182
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Jake Mannix
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-182.patch, matrixTimes.patch


 numRows() { return size()[ROW]; } and numCols() { return size()[COL]; } are 
 pretty much no-brainer methods, right?  Who wants to deal with a length-two 
 array of ints all the time when getting the number of rows and columns of a 
 matrix?
 Those are pretty trivial, but the key feature of a Matrix is to map Vector 
 instances to Vector instances, and while you can do that currently by making 
 a a row Matrix and doing Matrix.times(Matrix), it's silly to have to always 
 do that.  Matrix.times(Vector) is pretty needed.
 Even less trivial, for really big sparse Matrices, if you need to get (M'M)v 
 for some matrix M, then this can be computed in one pass through M without 
 ever computing the transpose of M by a simple reordering of the limits of 
 summation.
 Attaching a patch with these implementations, including unit tests (as well 
 as an improvement in the Matrix.times(Matrix) unit test to actually check the 
 math).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAHOUT-182) New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols()

2009-11-22 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-182.


   Resolution: Fixed
Fix Version/s: 0.3

Committed revision 883094.

 New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() 
 and numCols()
 ---

 Key: MAHOUT-182
 URL: https://issues.apache.org/jira/browse/MAHOUT-182
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Jake Mannix
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.3

 Attachments: MAHOUT-182.patch, matrixTimes.patch


 numRows() { return size()[ROW]; } and numCols() { return size()[COL]; } are 
 pretty much no-brainer methods, right?  Who wants to deal with a length-two 
 array of ints all the time when getting the number of rows and columns of a 
 matrix?
 Those are pretty trivial, but the key feature of a Matrix is to map Vector 
 instances to Vector instances, and while you can do that currently by making 
 a a row Matrix and doing Matrix.times(Matrix), it's silly to have to always 
 do that.  Matrix.times(Vector) is pretty needed.
 Even less trivial, for really big sparse Matrices, if you need to get (M'M)v 
 for some matrix M, then this can be computed in one pass through M without 
 ever computing the transpose of M by a simple reordering of the limits of 
 summation.
 Attaching a patch with these implementations, including unit tests (as well 
 as an improvement in the Matrix.times(Matrix) unit test to actually check the 
 math).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-22 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

Starting to add unit tests.  Still no support on the search/response side, but 
groundwork for adding multiple fields per SchemaField/FieldType is now laid.  
Still need a way to know that a field/fieldtype is going to output multiple 
fields so that we can detect them when searching, etc.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch, 
 SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-11-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781145#action_12781145
 ] 

Grant Ingersoll commented on SOLR-773:
--

bq. Grant is doing some fantastic work here, and I'm looking forward to seeing 
the outcome

Grant would definitely welcome help!  This is way too big for me.  People 
wanting to help, should take a look at all of the linked items on this issue 
and see where they can contribute.  If in doubt, please ask.  I'm good at 
telling people what to do  ;-)

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, 
 SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1567) Upgrade to Tika 0.5

2009-11-22 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1567.
---

Resolution: Fixed

Committed revision 883095.

 Upgrade to Tika 0.5
 ---

 Key: SOLR-1567
 URL: https://issues.apache.org/jira/browse/SOLR-1567
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-11-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781159#action_12781159
 ] 

Grant Ingersoll commented on SOLR-773:
--

Not so much a re-arch, but an extension of some pieces to handle some new 
ideas.  I think we all agree that Solr does a pretty good job of hiding some of 
the complexity of Lucene. So, by being able to simply declare a new field that 
is a CartesianTier field type, then the user need not worry at all about 
managing the tier prefix stuff that contrib/spatial requires. 

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, 
 SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-11-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781193#action_12781193
 ] 

Grant Ingersoll commented on SOLR-1586:
---

Sounds good, but how are you going to deal with the field types that need 
multiple fields (i.e. SOLR-1131)?

We certainly could put up a GeohashField to get things started.

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-11-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781215#action_12781215
 ] 

Grant Ingersoll commented on SOLR-1586:
---

I'm not sure what good that does to put a lat/lon in a single String in 
georss:point format.  What's your intent for searching/sorting/faceting?  

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1586) Create Spatial Point FieldTypes

2009-11-22 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781262#action_12781262
 ] 

Grant Ingersoll commented on SOLR-1586:
---

I'd say wait until SOLR-1131 is done for everything other than the 
GeohashFieldType, as what you are proposing doesn't get you anything over just 
using StrField.  By all means, put up a patch for GeohashFieldType when you 
have.  We can commit that now.

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-21 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Summary: Allow a single field type to index multiple fields  (was: Allow a 
single field to index multiple fields)

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780950#action_12780950
 ] 

Grant Ingersoll commented on SOLR-1131:
---

bq. Is this a good idea?

Not sure yet. 

bq. Why don't we add a new interface MutlValuedFieldType which extends 
FieldType for this 

Aren't we just substituting a very simple construction for an instanceof check?

I was possibly thinking of a couple of other options, too:
1. add a boolean on FT for isMultiField which returns false by default, then we 
could check that
2. Add a threadlocal that stores a preconstructed array of size one which could 
then simply be set for the single field case, which is the most common case.

My gut, however, says the object is very short lived and is likely to be of 
negligible cost.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1585) Refactor shards handling out of QueryComponent and into ShardsComponent

2009-11-21 Thread Grant Ingersoll (JIRA)
Refactor shards handling out of QueryComponent and into ShardsComponent
---

 Key: SOLR-1585
 URL: https://issues.apache.org/jira/browse/SOLR-1585
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 1.5


Per the TODOs in QueryComponent, create a ShardsComponent that handles setting 
up the shards.  Additionally, make it so that it can handle smaller parameters, 
too.  For instance, it is likely the case that in most setups only the IP 
address is changed, so we could have intelligent defaults which will make for 
shorter query strings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1586) Create Spatial Point FieldTypes

2009-11-21 Thread Grant Ingersoll (JIRA)
Create Spatial Point FieldTypes
---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


Per SOLR-773, create field types that hid the details of creating tiers, 
geohash and lat/lon fields.

Fields should take in lat/lon points in a single form, as in:
field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780954#action_12780954
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I'm also looking for ideas on how to handle the naming of the fields that are 
produced by this.  I think a FieldType that produces multiple fields should 
hide the logistics of the naming, which this patch doesn't even begin to 
scratch the surface of and also on the search side, how does one search against 
just one of the fields?

Would appreciated thoughts on that.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780954#action_12780954
 ] 

Grant Ingersoll edited comment on SOLR-1131 at 11/21/09 12:11 PM:
--

I'm also looking for ideas on how to handle the naming of the fields that are 
produced by this.  I think a FieldType that produces multiple fields should 
hide the logistics of the naming, which this patch doesn't even begin to 
scratch the surface of and also on the search side, how does one search against 
just one of the fields?

Would appreciate thoughts on that.

  was (Author: gsingers):
I'm also looking for ideas on how to handle the naming of the fields that 
are produced by this.  I think a FieldType that produces multiple fields should 
hide the logistics of the naming, which this patch doesn't even begin to 
scratch the surface of and also on the search side, how does one search against 
just one of the fields?

Would appreciated thoughts on that.
  
 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

2009-11-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780956#action_12780956
 ] 

Grant Ingersoll commented on SOLR-1131:
---

I definitely agree, Chris, the interesting part is how that manifests itself in 
terms of implementation, which is where I am digging in at the moment.  It 
means the Query parsers need to handle it as well as the ResponseWriters, etc.

 Allow a single field type to index multiple fields
 --

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

2009-11-20 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780607#action_12780607
 ] 

Grant Ingersoll commented on LUCENE-965:


Hi Hui,

I see you updated your paper on this, have you looked at how this might be 
implemented given the flexible indexing work under way?  

 Implement a state-of-the-art retrieval function in Lucene
 -

 Key: LUCENE-965
 URL: https://issues.apache.org/jira/browse/LUCENE-965
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
Reporter: Hui Fang
 Fix For: 3.1

 Attachments: axiomaticFunction.patch


 We implemented the axiomatic retrieval function, which is a state-of-the-art 
 retrieval function, to 
 replace the default similarity function in Lucene. We compared the 
 performance of these two functions and reported the results at 
 http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. 
 The report shows that the performance of the axiomatic retrieval function is 
 much better than the default function. The axiomatic retrieval function is 
 able to find more relevant documents and users can see more relevant 
 documents in the top-ranked documents. Incorporating such a state-of-the-art 
 retrieval function could improve the search performance of all the 
 applications which were built upon Lucene. 
 Most changes related to the implementation are made in AXSimilarity, 
 TermScorer and TermQuery.java.  However, many test cases are hand coded to 
 test whether the implementation of the default function is correct. Thus, I 
 also made the modification to many test files to make the new retrieval 
 function pass those cases. In fact, we found that some old test cases are not 
 reasonable. For example, in the testQueries02 of TestBoolean2.java, 
 the query is +w3 xx, and we have two documents w1 xx w2 yy w3 and w1 w3 
 xx w2 yy w3. 
 The second document should be more relevant than the first one, because it 
 has more 
 occurrences of the query term w3. But the original test case would require 
 us to rank 
 the first document higher than the second one, which is not reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (SOLR-1581) Facet by Function

2009-11-20 Thread Grant Ingersoll (JIRA)
Facet by Function
-

 Key: SOLR-1581
 URL: https://issues.apache.org/jira/browse/SOLR-1581
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 1.5


It would be really great if we could execute a function and quantize it into 
buckets that could then be returned as facets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1131) Allow a single field to index multiple fields

2009-11-20 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1131:
-

Assignee: Grant Ingersoll

 Allow a single field to index multiple fields
 -

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1582) DocumentBuilder does not properly handle binary field copy fields

2009-11-20 Thread Grant Ingersoll (JIRA)
DocumentBuilder does not properly handle binary field copy fields
-

 Key: SOLR-1582
 URL: https://issues.apache.org/jira/browse/SOLR-1582
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Trivial


In DocumentBuilder, around lines 267, the BinaryField is created, but it is 
never assigned to the field that is added to the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1582) DocumentBuilder does not properly handle binary field copy fields

2009-11-20 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1582.
---

   Resolution: Fixed
Fix Version/s: 1.5

committed.

 DocumentBuilder does not properly handle binary field copy fields
 -

 Key: SOLR-1582
 URL: https://issues.apache.org/jira/browse/SOLR-1582
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Trivial
 Fix For: 1.5


 In DocumentBuilder, around lines 267, the BinaryField is created, but it is 
 never assigned to the field that is added to the output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1131) Allow a single field to index multiple fields

2009-11-20 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1131:
--

Attachment: SOLR-1131.patch

Brings it up to trunk.  Still needs test cases.  All other tests pass.

 Allow a single field to index multiple fields
 -

 Key: SOLR-1131
 URL: https://issues.apache.org/jira/browse/SOLR-1131
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Grant Ingersoll
 Attachments: SOLR-1131-IndexMultipleFields.patch, SOLR-1131.patch


 In a few special cases, it makes sense for a single field (the concept) to 
 be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
 concept point may be best indexed in a variety of ways:
  * geohash (sincle lucene field)
  * lat field, lon field (two double fields)
  * cartesian tiers (a series of fields with tokens to say if it exists within 
 that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1574) simpler builtin functions

2009-11-19 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779955#action_12779955
 ] 

Grant Ingersoll commented on SOLR-1574:
---

+1.  This was my first time writing functions.  Overall, pretty easy to do, but 
in some cases I felt I was copying a lot of code, with the primary difference 
being the number of DocValues I needed to pass through.  Not quite sure how to 
handle that in a more general way.

 simpler builtin functions
 -

 Key: SOLR-1574
 URL: https://issues.apache.org/jira/browse/SOLR-1574
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Priority: Minor
 Attachments: SOLR-1574.patch


 Make it easier and less error prone to add simple functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1578) Develop a Spatial Query Parser

2009-11-19 Thread Grant Ingersoll (JIRA)
Develop a Spatial Query Parser
--

 Key: SOLR-1578
 URL: https://issues.apache.org/jira/browse/SOLR-1578
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 1.5


Given all the work around spatial, it would be beneficial if Solr had a query 
parser for dealing with spatial queries.  For starters, something that used 
geonames data or maybe even Google Maps API would be really useful.  Longer 
term, a spatial grammar that can robustly handle all the vagaries of addresses, 
etc. would be really cool.

Refs: 
[1] http://www.geonames.org/export/client-libraries.html (note the Java client 
is ASL)
[2] Data from geo names: http://download.geonames.org/export/dump/


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCENE-2081) CartesianShapeFilter improvements

2009-11-18 Thread Grant Ingersoll (JIRA)
CartesianShapeFilter improvements
-

 Key: LUCENE-2081
 URL: https://issues.apache.org/jira/browse/LUCENE-2081
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: Grant Ingersoll
Priority: Minor


The CartesiahShapeFilter could use some improvements.  For starters, if we make 
sure the boxIds are sorted in index order, this should reduce the cost of 
seeks.  I think we should also replace the logging with a similar approach to 
Lucene's output stream.  We also can do Term creation a tad bit more 
efficiently too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-18 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779456#action_12779456
 ] 

Grant Ingersoll commented on MAHOUT-165:


All sounding pretty good.  If you don't mind, I'd like to review the legal bits 
before committing.  I have a deadline on Thursday, but can get to it after that.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-18nov-updated.patch, 
 mahout-165-18nov.patch, mahout-165-trove.patch, MAHOUT-165-updated.patch, 
 MAHOUT-165-with-colt-module.patch, MAHOUT-165-with-colt.patch, 
 mahout-165.patch, MAHOUT-165.patch, mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1568) Implement Cartesian Tier Filter

2009-11-18 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1568:
--

Attachment: CartesianTierQParserPlugin.java

Here's the start of a QParserPlugin for this functionality.  It's just the Java 
code and is not integrated with Solr yet.

It works using the Lucene spatial stuff, but it should not be committed at this 
point, b/c I want to make it work with Tier based field types so that the end 
user need not even think about what the field name structure is (i.e. tier_).

Can query with it as something like:  {!tier x=32 y=-79 dist=20 prefix=tier_}.  
If you did want to use it, you would need to add it to your solrconfig.xml.

 Implement Cartesian Tier Filter
 ---

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: CartesianTierQParserPlugin.java


 Given an index with cartesian tiers, we should be able to pass in a filter 
 query that takes in the field name, lat, lon and radius and produces an 
 appropriate Filter for use by Solr.  Note, contrib/spatial has such a filter, 
 so it may just be that we need to hook in a QParserPlugin to handle it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1568) Implement Cartesian Tier Filter

2009-11-18 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779662#action_12779662
 ] 

Grant Ingersoll commented on SOLR-1568:
---

Honestly, it's a bit tricky and I'm not sure how best to resolve it (keep in 
mind that I'm not recommending the above piece be committed).  There are a few 
pieces that can be improved in the CartesianShapeFilter to perform better (I 
still need to fix them in Lucene) but if I go that route, then I need to update 
the Lucene JARs associated with that in Solr.  I can't really do that, b/c that 
likely means one of two things:

1. Patching the 2.9.1 branch and packaging up that JAR for Solr or wait for a 
2.9.2 release from Lucene which isn't likely.
2. Patching trunk and including it.  This would be a huge undertaking for Solr

So, in the end, my decision was based on the fact that the code for it was 
pretty simple and wouldn't be a big deal to fix.  Longer term, I will fix the 
issue in trunk of Lucene and then over time Solr can adapt to use that once we 
are on 3.x

 Implement Cartesian Tier Filter
 ---

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: CartesianTierQParserPlugin.java


 Given an index with cartesian tiers, we should be able to pass in a filter 
 query that takes in the field name, lat, lon and radius and produces an 
 appropriate Filter for use by Solr.  Note, contrib/spatial has such a filter, 
 so it may just be that we need to hook in a QParserPlugin to handle it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCENE-2078) Remove dependencies on specific field names or prefixes for field names (i.e. tierPrefix)

2009-11-17 Thread Grant Ingersoll (JIRA)
Remove dependencies on specific field names or prefixes for field names (i.e. 
tierPrefix)
-

 Key: LUCENE-2078
 URL: https://issues.apache.org/jira/browse/LUCENE-2078
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: Grant Ingersoll
 Fix For: 3.1


Currently, the spatial contrib makes a lot of assumptions about what field 
names are when these are simply not needed.  By doing so, it prevents re-use in 
other applications that have setup their fields differently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778832#action_12778832
 ] 

Grant Ingersoll commented on MAHOUT-165:


Shashi, can you make sure the patch is up to date?

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778831#action_12778831
 ] 

Grant Ingersoll commented on MAHOUT-165:


Yep, I think we are all agreed on Colt.  I'll get 0.2 out and then we can add 
it.

 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-11-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779111#action_12779111
 ] 

Grant Ingersoll commented on MAHOUT-165:


bq. So I found Wolfgang Hoschek, the author of Colt, and he confirms that it is 
no longer maintained, and wishes us the best of luck in taking it over for 
ourselves if we so desired.

I seem to recall him being a Lucene contributor in the past.  Perhaps he would 
be willing to donate Colt to Apache?  I don't think we can just bring in it's 
source and claim it as ours.  Another option is we see if he would move it over 
to Google Code and make some of us committers on the project.  Perhaps Commons 
Math is interested in it, too.



 Using better primitives hash for sparse vector for performance gains
 

 Key: MAHOUT-165
 URL: https://issues.apache.org/jira/browse/MAHOUT-165
 Project: Mahout
  Issue Type: Improvement
  Components: Matrix
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: colt.jar, mahout-165-trove.patch, 
 MAHOUT-165-updated.patch, mahout-165.patch, MAHOUT-165.patch, mahout-165.patch


 In SparseVector, we need primitives hash map for index and values. The 
 present implementation of this hash map is not as efficient as some of the 
 other implementations in non-Apache projects. 
 In an experiment, I found that, for get/set operations, the primitive hash of 
  Colt performance an order of magnitude better than OrderedIntDoubleMapping. 
 For iteration it is 2x slower, though. 
 Using Colt in Sparsevector improved performance of canopy generation. For an 
 experimental dataset, the current implementation takes 50 minutes. Using 
 Colt, reduces this duration to 19-20 minutes. That's 60% reduction in the 
 delay. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1569) Allow literal Strings in functions

2009-11-17 Thread Grant Ingersoll (JIRA)
Allow literal Strings in functions
--

 Key: SOLR-1569
 URL: https://issues.apache.org/jira/browse/SOLR-1569
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


Some functions (for instance, those who take a geohash) may need to pass 
literal strings.  This patch modifies the FunctionQParser to allow for quoted 
strings in functions (either single quote or double quote) to be passed through 
as a LiteralValueSource.  It also adds the LiteralValueSource.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1569) Allow literal Strings in functions

2009-11-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778874#action_12778874
 ] 

Grant Ingersoll commented on SOLR-1569:
---

See 
http://www.lucidimagination.com/search/document/2ca306cbd392493f/passing_string_constants_to_functions

 Allow literal Strings in functions
 --

 Key: SOLR-1569
 URL: https://issues.apache.org/jira/browse/SOLR-1569
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 Some functions (for instance, those who take a geohash) may need to pass 
 literal strings.  This patch modifies the FunctionQParser to allow for quoted 
 strings in functions (either single quote or double quote) to be passed 
 through as a LiteralValueSource.  It also adds the LiteralValueSource.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1569) Allow literal Strings in functions

2009-11-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1569:
--

Attachment: SOLR-1569.patch

 Allow literal Strings in functions
 --

 Key: SOLR-1569
 URL: https://issues.apache.org/jira/browse/SOLR-1569
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1569.patch


 Some functions (for instance, those who take a geohash) may need to pass 
 literal strings.  This patch modifies the FunctionQParser to allow for quoted 
 strings in functions (either single quote or double quote) to be passed 
 through as a LiteralValueSource.  It also adds the LiteralValueSource.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1569) Allow literal Strings in functions

2009-11-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1569.
---

Resolution: Fixed

Committed revision 881319.

 Allow literal Strings in functions
 --

 Key: SOLR-1569
 URL: https://issues.apache.org/jira/browse/SOLR-1569
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1569.patch


 Some functions (for instance, those who take a geohash) may need to pass 
 literal strings.  This patch modifies the FunctionQParser to allow for quoted 
 strings in functions (either single quote or double quote) to be passed 
 through as a LiteralValueSource.  It also adds the LiteralValueSource.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (SOLR-1568) Implement Cartesian Tier Filter

2009-11-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on SOLR-1568 started by Grant Ingersoll.

 Implement Cartesian Tier Filter
 ---

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 Given an index with cartesian tiers, we should be able to pass in a filter 
 query that takes in the field name, lat, lon and radius and produces an 
 appropriate Filter for use by Solr.  Note, contrib/spatial has such a filter, 
 so it may just be that we need to hook in a QParserPlugin to handle it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things

2009-11-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1302.
---

Resolution: Fixed

I'm going to close this, as I've implemented a bunch of them, with the 
exception of cosine.  If someone wants to do that one, they can open a new 
issue.

 Fun with Distances - Add Distance functions for a variety of things
 ---

 Key: SOLR-1302
 URL: https://issues.apache.org/jira/browse/SOLR-1302
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch


 There are many distance functions that are useful to have:
 1. Great Circle (lat/lon) and other geo distances
 2. Euclidean (Vector)
 3. Manhattan (Vector)
 4. Cosine (Vector)
 For the vector ones, the idea is that the fields on a document can be used to 
 determine the vector.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things

2009-11-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778308#action_12778308
 ] 

Grant Ingersoll commented on SOLR-1302:
---

Still benchmarking.  I think there are two other things that are needed for 
performance: 
1.  Filtering (see the frange stuff)
2. Caching of function results w/in a single search for use between filtering, 
scoring and sorting

 Fun with Distances - Add Distance functions for a variety of things
 ---

 Key: SOLR-1302
 URL: https://issues.apache.org/jira/browse/SOLR-1302
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch


 There are many distance functions that are useful to have:
 1. Great Circle (lat/lon) and other geo distances
 2. Euclidean (Vector)
 3. Manhattan (Vector)
 4. Cosine (Vector)
 For the vector ones, the idea is that the fields on a document can be used to 
 determine the vector.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1302) Fun with Distances - Add Distance functions for a variety of things

2009-11-16 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778312#action_12778312
 ] 

Grant Ingersoll commented on SOLR-1302:
---

That being said, my anecdotal tests show them to be pretty fast over ~68K docs. 
 I'm going to load up the entire Open Street Map planet at some point this week 
and then I can run real tests.

 Fun with Distances - Add Distance functions for a variety of things
 ---

 Key: SOLR-1302
 URL: https://issues.apache.org/jira/browse/SOLR-1302
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1302.patch, SOLR-1302.patch, SOLR-1302.patch


 There are many distance functions that are useful to have:
 1. Great Circle (lat/lon) and other geo distances
 2. Euclidean (Vector)
 3. Manhattan (Vector)
 4. Cosine (Vector)
 For the vector ones, the idea is that the fields on a document can be used to 
 determine the vector.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1567) Upgrade to Tika 0.5

2009-11-16 Thread Grant Ingersoll (JIRA)
Upgrade to Tika 0.5
---

 Key: SOLR-1567
 URL: https://issues.apache.org/jira/browse/SOLR-1567
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1568) Implement Cartesian Tier Filter

2009-11-16 Thread Grant Ingersoll (JIRA)
Implement Cartesian Tier Filter
---

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


Given an index with cartesian tiers, we should be able to pass in a filter 
query that takes in the field name, lat, lon and radius and produces an 
appropriate Filter for use by Solr.  Note, contrib/spatial has such a filter, 
so it may just be that we need to hook in a QParserPlugin to handle it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    5   6   7   8   9   10   11   12   13   14   >