date:20110326

I ran into a few kinks w/ signing artifacts (it wasn't finding the maven 
artifacts) in Solr and am fixing them.  Once that goes through, I will upload 
an RC
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2990) Improve ArrayUtil/CollectionUtil.*Sort() methods to early-reaturn on empty or one-element lists/arrays


 [ 
https://issues.apache.org/jira/browse/LUCENE-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2990.
---

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Renamed local variable from l to size.

Committed trunk revision: 1085689
Committed 3.x revision: 1085691

 Improve ArrayUtil/CollectionUtil.*Sort() methods to early-reaturn on empty or 
 one-element lists/arrays
 --

 Key: LUCENE-2990
 URL: https://issues.apache.org/jira/browse/LUCENE-2990
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2990.patch, LUCENE-2990.patch, LUCENE-2990.patch


 It might be a good idea to make CollectionUtil or ArrayUtil return early if 
 the passed-in list or array's length = 1 because sorting is unneeded then. 
 This improves maybe automaton or other places, as for empty or one-element 
 lists no SorterTermplate is created.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch

When 3.1 is released, update backwards tests in 3.x branch
--

 Key: LUCENE-2994
 URL: https://issues.apache.org/jira/browse/LUCENE-2994
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler


When we have released the official artifacts of Lucene 3.1 (the final ones!!!), 
we need to do the following:

- svn rm backwards/src/test
- svn cp 
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test
 backwards/src/test
- Copy the lucene-core-3.1.0.jar from the last release tarball to 
lucene/backwards/lib and delete old one.
- Check that everything is correct: The backwards folder should contain a src/ 
folder that now contains test. The files should be the ones from the branch.
- Run ant test-backwards

Uwe will take care of this!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2994:
--

Affects Version/s: 3.2
Fix Version/s: 3.2

 When 3.1 is released, update backwards tests in 3.x branch
 --

 Key: LUCENE-2994
 URL: https://issues.apache.org/jira/browse/LUCENE-2994
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.2
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2


 When we have released the official artifacts of Lucene 3.1 (the final 
 ones!!!), we need to do the following:
 - svn rm backwards/src/test
 - svn cp 
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test
  backwards/src/test
 - Copy the lucene-core-3.1.0.jar from the last release tarball to 
 lucene/backwards/lib and delete old one.
 - Check that everything is correct: The backwards folder should contain a 
 src/ folder that now contains test. The files should be the ones from the 
 branch.
 - Run ant test-backwards
 Uwe will take care of this!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2994) When 3.1 is released, update backwards tests in 3.x branch


[ 
https://issues.apache.org/jira/browse/LUCENE-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011610#comment-13011610
 ] 

Uwe Schindler commented on LUCENE-2994:
---

We have to also clone the 3.1 test-framework, so it's a little bit more work, 
but it should be easy to do.

 When 3.1 is released, update backwards tests in 3.x branch
 --

 Key: LUCENE-2994
 URL: https://issues.apache.org/jira/browse/LUCENE-2994
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.2
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2


 When we have released the official artifacts of Lucene 3.1 (the final 
 ones!!!), we need to do the following:
 - svn rm backwards/src/test
 - svn cp 
 https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/lucene/src/test
  backwards/src/test
 - Copy the lucene-core-3.1.0.jar from the last release tarball to 
 lucene/backwards/lib and delete old one.
 - Check that everything is correct: The backwards folder should contain a 
 src/ folder that now contains test. The files should be the ones from the 
 branch.
 - Run ant test-backwards
 Uwe will take care of this!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011613#comment-13011613
]

Robert Muir commented on SOLR-2155:
---

I don't really think things like this (queries etc) should go into just Solr,
while we leave the lucene-contrib spatial package broken.

Lets put things in the right places?

Geospatial search using geohash prefixes

Key: SOLR-2155
URL: https://issues.apache.org/jira/browse/SOLR-2155
Project: Solr
Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch

There currently isn't a solution in Solr for doing geospatial filtering on
documents that have a variable number of points. This scenario occurs when
there is location extraction (i.e. via a gazateer) occurring on free text.
None, one, or many geospatial locations might be extracted from any given
document and users want to limit their search results to those occurring in a
user-specified area.
I've implemented this by furthering the GeoHash based work in Lucene/Solr
with a geohash prefix based filter. A geohash refers to a lat-lon box on the
earth. Each successive character added further subdivides the box into a 4x8
(or 8x4 depending on the even/odd length of the geohash) grid. The first
step in this scheme is figuring out which geohash grid squares cover the
user's search query. I've added various extra methods to GeoHashUtils (and
added tests) to assist in this purpose. The next step is an actual Lucene
Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
TermsEnum.seek() to skip to relevant grid squares in the index. Once a
matching geohash grid is found, the points therein are compared against the
user's query to see if it matches. I created an abstraction GeoShape
extended by subclasses named PointDistance... and CartesianBox to support
different queried shapes so that the filter need not care about these details.
This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011615#comment-13011615
]

Grant Ingersoll commented on SOLR-2155:
---

Yeah, I agree. I haven't looked at the patch yet. It was my understanding
that Chris Male was going to move lucene/contrib/spatial to modules and gut the
broken stuff in it. I think there is a separate issue open for that one.
Presumably, once spatial and function queries are moved to modules, then we
will have a properly working spatial package.

I obviously can move it, but I don't have time to do the gutting (we really
should have deprecated the tier stuff for this release).

Geospatial search using geohash prefixes

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616
]

Robert Muir commented on SOLR-2155:
---

well what would the deprecation have suggested as an alternative?

Geospatial search using geohash prefixes

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011619#comment-13011619
]

Chris Male commented on SOLR-2155:
--

In LUCENE-2599 I deprecated the spatial contrib. The problem is as Robert
raises, deprecating the code without providing an alternative isn't that user
friendly. I think as part of this issue we should start up the spatial module
and work towards moving what we can there. Moving function queries is going to
take some time since they are very coupled to Solr. But that shouldn't
preclude us from putting into the module what we can. Once we have a module
that provides a reasonable set of functionality, then we can
deprecate/gut/remove the spatial contrib.

Geospatial search using geohash prefixes

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

Not really related to this issue, so moving to dev@...

On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote:

 
[ 
 https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616
  ] 
 
 Robert Muir commented on SOLR-2155:
 ---
 
 well what would the deprecation have suggested as an alternative?

It's a good question.  The tier stuff, IMO and confirmed by others is broken 
for most of the world.  I sunk a good week into fixing it and was so entangled 
in the spaghetti that I gave up.  What we laid out on another issue (I forget 
the number, but I think C Male owns it and says he has a rewrite) is to move to 
modules, keep what we can (geohash and some of the utils) and gut the rest.  
That combined w/ moving function queries to modules would make all of spatial a 
good solution for the large majority of users.  The only thing that would 
remain to be back to our current state (at least in terms of features) would be 
to implement a tier approach.  I've proposed the Military Grid System (there is 
an open JIRA issue for it) as something that looks to be as a good candidate.  
It's well documented on the web and uses a metric for all distances and has the 
benefit that all of NATO uses it, albeit for different purposes.  It also 
addresses the poles and the meridians as first class citizens.  It just needs 
an implementer.  Having said that, I'm not 100% certain.  I also don't know 
that the tier stuff is absolutely necessary.  The combination of what we have 
in function queries plus trie fields makes for a very fast spatial lookup at 
this point.

I'm totally open to other suggestions, however.

Longer term, I've got a lot of ideas for spatial, but that's a different thread.

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-1298) FunctionQuery results as pseudo-fields

2011-03-26 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-1298:
--

Assignee: Yonik Seeley  (was: Grant Ingersoll)

 FunctionQuery results as pseudo-fields
 --

 Key: SOLR-1298
 URL: https://issues.apache.org/jira/browse/SOLR-1298
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
Priority: Minor
 Fix For: Next

 Attachments: SOLR-1298-FieldValues.patch, SOLR-1298.patch


 It would be helpful if the results of FunctionQueries could be added as 
 fields to a document. 
 Couple of options here:
 1. Run FunctionQuery as part of relevance score and add that piece to the 
 document
 2. Run the function (not really a query) during Document/Field retrieval

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

I disagree strongly with the sentiment that queries don't belong in Solr.
Everything developed in/for lucene need not be exported to Solr immediately.
Everything developed in/for solr need not be exported to Lucene immediately.

If the work has been done, and the patch works for Solr, that should
be enough.  Period.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes


On Mar 26, 2011, at 8:24 AM, Robert Muir wrote:

 On Sat, Mar 26, 2011 at 8:06 AM, Grant Ingersoll gsing...@apache.org wrote:
 Not really related to this issue, so moving to dev@...
 
 On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote:
 
 
[ 
 https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011616#comment-13011616
  ]
 
 Robert Muir commented on SOLR-2155:
 ---
 
 well what would the deprecation have suggested as an alternative?
 
 It's a good question.  The tier stuff, IMO and confirmed by others is broken 
 for most of the world.  I sunk a good week into fixing it and was so 
 entangled in the spaghetti that I gave up.  What we laid out on another 
 issue (I forget the number, but I think C Male owns it and says he has a 
 rewrite) is to move to modules, keep what we can (geohash and some of the 
 utils) and gut the rest.  That combined w/ moving function queries to 
 modules would make all of spatial a good solution for the large majority of 
 users.  The only thing that would remain to be back to our current state (at 
 least in terms of features) would be to implement a tier approach.  I've 
 proposed the Military Grid System (there is an open JIRA issue for it) as 
 something that looks to be as a good candidate.  It's well documented on the 
 web and uses a metric for all distances and has the benefit that all of NATO 
 uses it, albeit for different purposes.  It also addresses the poles and the 
 meridians as first class citizens.  It just needs an implementer.  Having 
 said that, I'm not 100% certain.  I also don't know that the tier stuff is 
 absolutely necessary.  The combination of what we have in function queries 
 plus trie fields makes for a very fast spatial lookup at this point.
 
 I'm totally open to other suggestions, however.
 
 Longer term, I've got a lot of ideas for spatial, but that's a different 
 thread.
 
 
 I guess the reason I asked my question is more high-level: on one hand
 there are suggestions that lucene's spatial package should have been
 deprecated in 3.1, but on the other hand the very first feature on
 solr 3.1's new feature list is 'improved geospatial support'.
 

It really should say: Added Geospatial Support, as it was non-existent in Solr 
before.

Most of the work for adding in spatial in Solr consisted of improving things in 
Solr to make it easy to leverage the one spatial feature we really added: 
distance based functions and parsing support.  Everything else was generally 
useful things: sorting by function, poly fields, etc.  I started on tier 
support, but dropped it when I realized it was broken beyond repair.  The Solr 
stuff uses, IMO, the stuff in Lucene that works and ignores the rest.  I seem 
to recall Chris had said that once I got done w/ the Solr stuff he would do the 
modules work, but it hasn't happened yet.

I'd say in 3.2, since it sounds like Chris did at least deprecate 
contrib/spatial, that we work to get all of this resolved:  spatial - modules, 
function queries - modules.  Naturally we should do it on trunk, too.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes


On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote:

 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr
 
 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.
 
 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.
 

I agree it's enough for the contributor to do that, but as committers we need 
to look at the bigger picture in this particular case, which is the move of 
spatial to modules.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

On Sat, Mar 26, 2011 at 9:57 AM, Grant Ingersoll gsing...@apache.org wrote:

 On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote:

 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.


 I agree it's enough for the contributor to do that, but as committers we need 
 to look at the bigger picture in this particular case, which is the move of 
 spatial to modules.

That's a separate asynchronous issue.
Progress should not be blocked in Solr in the meantime.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Robert Muir

On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.


Its not enough for me: you can expect me to start raising questions
and objections when things are committed to the wrong place in the
codebase, its totally appropriate. We merged development, all
committers can commit to the correct places, there are no excuses.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Nicolas Helleringer


  I started on tier support, but dropped it when I realized it was broken
 beyond repair.


I did no know one could break code beyond repair

Nicolas

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

On Sat, Mar 26, 2011 at 10:05 AM, Robert Muir rcm...@gmail.com wrote:
 On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.


 Its not enough for me: you can expect me to start raising questions
 and objections when things are committed to the wrong place in the
 codebase, its totally appropriate. We merged development, all
 committers can commit to the correct places, there are no excuses.

If you're saying Queries don't belong in Solr, I'm a huge -1 on that.
There's no correct place for queries in general - it's all in the context.
If there's a better place for the query that can be achieved with a
mv, then fine.
But there's often much more work involved, dependencies on other solr features,
or fleshing out a real Java API rather than treating something as
simple implementation.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2396) add [ICU]CollationField


 [ 
https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned SOLR-2396:
-

Assignee: Robert Muir

 add [ICU]CollationField
 ---

 Key: SOLR-2396
 URL: https://issues.apache.org/jira/browse/SOLR-2396
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch, 
 SOLR-2396.patch


 In LUCENE-2551 collation support was changed to use byte[] keys.
 Previously it encoded sort keys with IndexableBinaryString into char[],
 but this is wasteful with regards to RAM and disk when terms can be byte.
 A better solution would be [ICU]CollationFieldTypes, as this would also allow 
 locale-sensitive range queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2396) add [ICU]CollationField


[ 
https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011646#comment-13011646
 ] 

Robert Muir commented on SOLR-2396:
---

I'd like to commit this in a few days if no one objects. The existing encoding 
is wasteful and I would like to cut solr over to this more efficient one (and 
enable locale-sensitive range queries).

We could open future issues for any additional features such as specifying the 
icu locale as BCP47, etc, etc. (this just implements the lucene 3.1 
functionality more efficiently) 


 add [ICU]CollationField
 ---

 Key: SOLR-2396
 URL: https://issues.apache.org/jira/browse/SOLR-2396
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2396.patch, SOLR-2396.patch, SOLR-2396.patch, 
 SOLR-2396.patch


 In LUCENE-2551 collation support was changed to use byte[] keys.
 Previously it encoded sort keys with IndexableBinaryString into char[],
 but this is wasteful with regards to RAM and disk when terms can be byte.
 A better solution would be [ICU]CollationFieldTypes, as this would also allow 
 locale-sensitive range queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread David Smiley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011653#comment-13011653
]

David Smiley commented on SOLR-2155:

I plan to finish a couple improvements to this patch within 2 weeks time:
distance function queries to work with multi-value, and polygon queries that
span the date line. I've been delayed by some life events (new baby).
Furthermore, I'll try and ensure that the work here is applicable to pure
Lucene users (i.e. sans Solr).

One thing I'm unsure of is how to integrate (or not integrate) existing Lucene
Solr spatial code with this patch. In this patch I chose to re-use some
basic shape classes in Lucene's spatial contrib simply because they were
already there, but I could just as easily of not. My preference going forward
would be to outright replace Lucene's spatial contrib with this patch. I also
think LatLonType and PointType could become deprecated since this patch is not
only more capable (multiValue support) but faster too. Well with filtering,
sorting is TBD. I'm also inclined to name the field type LatLonGeohashType to
re-enforce the fact that it works with lat lon; geohash is an implementation
detail. In the future it might even not be geohash, strictly speaking, once we
optimize the encoding.

Geospatial search using geohash prefixes

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Ryan McKinley

FYI, I'm working on revamping lucene spatial in general
https://lucene-spatial-playground.googlecode.com/svn/trunk/
http://code.google.com/p/lucene-spatial-playground/

These are just sketch APIs for now, but i hope to get them cleaned up
and contributed soon.

The proposal will be for 3 packages in /modules
1. spatial stuff w/o lucene dependencies -- shapes, distances, etc
2. lucene support for these types
3. solr support for the lucene stuff
(4) demo, probably keep this as an external project since UI and demo
stuff is much easier on the outside.

I hope to migrate the existing spatial stuff to this structure and
remove the not-really-working stuff.

I'll post more when things are closer to commitable.

ryan

On Sat, Mar 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote:
 On Sat, Mar 26, 2011 at 11:03 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just 
 Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.

 This is an important enough point that I'm going to follow it up with
 a quote from Mike:

 The combined dev community would have no requirement/expectation that
 if someone adds something cool to Lucene they must also expose it in
 Solr. There will still be devs that wear mostly Solr vs most Lucene
 hats. There will also be devs that comfortably wear both. There will
 be devs that focus on analyzers and do amazing things ;)

 We merged to *enable* moving code around easier, not to mandate it.
 It is wrong to object to a patch because someone hasn't done extra
 work with their solr hat on to enable it's use in solr.
 It is wrong to object to a patch because someone hasn't done extra
 work with their lucene hat on enable it's use in lucene.


 With that out of the way, let's get more specific: what Query in
 this patch should be moved, and to where?


 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011661#comment-13011661
 ] 

Ryan McKinley commented on SOLR-2155:
-

Congratulations on the new baby!

Thinking about spatial support in general, I think we should settle on some 
basic APIs and approaches that can be used across many indexing strategies.  In 
http://code.google.com/p/lucene-spatial-playground/ I'm messing with how we can 
use a standard API to index Shapes with various strategies.  As always, each 
stratagey has its tradeoffs, but if we can keep the high level APIs similar, 
that makes choosing the right approach easier.  In this project I'm looking at 
indexing shaps as:
 * bounding box -- 4 fields xmin/xmax/ymin./ymax
 * prefix grids -- like geohash or 
[csquars|http://www.marine.csiro.au/csquares/about-csquares.htm]
 * in memory spatial index (rtree/quadtree)
 * raw WKB geometry tokens
 * points -- x,y fields
 * etc

To keep things coherent, I'm proposing a high level interface like:
https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/SpatialQueryBuilder.java

And then each implementation fills it in:
https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/prefix/PrefixGridQueryBuilder.java

This solr to just handle setup and configuration:
http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-solr/src/main/java/org/apache/solr/spatial/prefix/SpatialPrefixGridFieldType.java

In my view geohash is a subset of 'spatial prefix grid' (is there a real name 
for this?) -- the interface i'm proposing is:
http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-base/src/main/java/org/apache/lucene/spatial/base/prefix/SpatialPrefixGrid.java
essentially:
{code}
  public ListCharSequence readCells( Shape geo );
{code}

Geohash for a point would just be a list of one token -- for a polygon, it 
would be a collection of tokens that fill the space like csquares

I aim to get this basic structure in a lucene branch and maybe into trunk in 
the next few weeks


 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

On Sat, Mar 26, 2011 at 11:12 AM, Robert Muir rcm...@gmail.com wrote:
 On Sat, Mar 26, 2011 at 11:03 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 9:48 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA) j...@apache.org wrote:
 I don't really think things like this (queries etc) should go into just 
 Solr

 I disagree strongly with the sentiment that queries don't belong in Solr.
 Everything developed in/for lucene need not be exported to Solr immediately.
 Everything developed in/for solr need not be exported to Lucene immediately.

 If the work has been done, and the patch works for Solr, that should
 be enough.  Period.

 This is an important enough point that I'm going to follow it up with
 a quote from Mike:

 The combined dev community would have no requirement/expectation that
 if someone adds something cool to Lucene they must also expose it in
 Solr. There will still be devs that wear mostly Solr vs most Lucene
 hats. There will also be devs that comfortably wear both. There will
 be devs that focus on analyzers and do amazing things ;)

 We merged to *enable* moving code around easier, not to mandate it.
 It is wrong to object to a patch because someone hasn't done extra
 work with their solr hat on to enable it's use in solr.
 It is wrong to object to a patch because someone hasn't done extra
 work with their lucene hat on enable it's use in lucene.


 With that out of the way, let's get more specific: what Query in
 this patch should be moved, and to where?


 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

There need not be any linkage to lucene to improve a Solr feature.
If you disagree, we should vote to clarify - this is too important
(and too much of a negative for Solr).

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Ryan McKinley


 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 There need not be any linkage to lucene to improve a Solr feature.
 If you disagree, we should vote to clarify - this is too important
 (and too much of a negative for Solr).


I don't think there is *requirement* to move the core spatial stuff to
lucene, but I think there is huge benefit to both communities if
things have as few dependencies as possible.  To be frank, the spatial
support in solr is pretty hairy -- it works for some use cases, but is
not extendable and quite basic.  Calling it 'distance' seems more
appropriate then 'spatial'

For good spatial support, I think we want to organize things with as
few dependencies/assumptions as possible.  This will let:
 * only basic math/geometry -- anything complex should use existing
well tested solid frameworks (JTS/proj4/geotools/etc) we should not be
reinventing/retesting this stuff.  We need basic APIs that will work
well with these external tools
 * lucene focus on fields and queries
 * solr focus on configuration and external interface

This structure and constraints would be a big win for everyone.

As always this stuff is hard to talk about in the abstract w/o a real
proposal -- of course fixing/improving solr features does not
*require* working in lucene-core.  But I think we get better solutions
when we aim for modular designs with minimum dependencies.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:

 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 There need not be any linkage to lucene to improve a Solr feature.
 If you disagree, we should vote to clarify - this is too important
 (and too much of a negative for Solr).


 I don't think there is *requirement* to move the core spatial stuff to
 lucene, but I think there is huge benefit to both communities if
 things have as few dependencies as possible.  To be frank, the spatial
 support in solr is pretty hairy -- it works for some use cases, but is
 not extendable and quite basic.  Calling it 'distance' seems more
 appropriate then 'spatial'

Having something basic that works (and has a clean enough high level
HTTP interface)
was clearly a win for Solr users.  The


 For good spatial support, I think we want to organize things with as
 few dependencies/assumptions as possible.  This will let:
  * only basic math/geometry -- anything complex should use existing
 well tested solid frameworks (JTS/proj4/geotools/etc) we should not be
 reinventing/retesting this stuff.  We need basic APIs that will work
 well with these external tools
  * lucene focus on fields and queries
  * solr focus on configuration and external interface

 This structure and constraints would be a big win for everyone.

 As always this stuff is hard to talk about in the abstract w/o a real
 proposal -- of course fixing/improving solr features does not
 *require* working in lucene-core.  But I think we get better solutions
 when we aim for modular designs with minimum dependencies.

 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:

 No, the question is: what justification is there for adding spatial
 support to solr-only, leaving lucene with a broken contrib module,
 versus adding it where it belongs and exposing it to solr?

 There need not be any linkage to lucene to improve a Solr feature.
 If you disagree, we should vote to clarify - this is too important
 (and too much of a negative for Solr).


 I don't think there is *requirement* to move the core spatial stuff to
 lucene, but I think there is huge benefit to both communities if
 things have as few dependencies as possible.  To be frank, the spatial
 support in solr is pretty hairy -- it works for some use cases, but is
 not extendable and quite basic.  Calling it 'distance' seems more
 appropriate then 'spatial'


Having something basic that works (and has a clean enough high level
HTTP interface) was clearly a win for Solr users.

Of course a more fully featured spatial module would be a win for
everyone, but that's ignoring the more generic issue at hand here:
a patch that improves Solr's spatial
should not be blocked on the grounds that it does not improve Lucene's
spatial enough.

Likewise, the ridiculous notion that Queries don't belong in Solr
needs to be put to rest.

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [solr] DataSource for HBase Tables?

Yes, if you are going to use the Data Import Handler, I would say that is the 
route to go.  You might also look at using an abstraction like Gora instead of 
having a dependency directly on HBase.


On Mar 25, 2011, at 4:32 PM, Sterk, Paul (Contractor) wrote:

 Hi,
  
 I have a requirement to use Solr to import data from an HBase table and index 
 the contents – similar to importing data from a RDBMS.  It looks like I will 
 need to create an org.apache.solr.handler.dataimport.DataSourceT
 implementation for HBase to be used by the Data Import Handler.
  
 Is this the correct approach?  If it is, has someone created a DataSource 
 implementation for HBase?
  
 Paul
  
  
 This message, including any attachments, is the property of Sears Holdings 
 Corporation and/or one of its subsidiaries. It is confidential and may 
 contain proprietary or legally privileged information. If you are not the 
 intended recipient, please delete it without reading the contents. Thank you.

--
Grant Ingersoll
http://www.lucidimagination.com

Re: Interested in GSOC

2011-03-26 Thread Vinicius Barrox

Thanks for the tips. I'm going through the code and javadocs right now, I will 
let you know when I have any doubts.
I'm not sure to which part of Lucene I'm intending to right a proposal yet, but 
search/query and query parsing sounds interesting.

On Fri, Mar 25, 2011 at 7:15 PM, Adriano 
Crestani adrianocrest...@gmail.com wrote:
Hi Vinicius,
Welcome to Lucene!
I think a good place to look for internal design documentation is the javadoc 
package summary. Here is an example: [1], each package usually has its own 
detailed summary.
I hope it helps ;)
[1] 
- http://lucene.apache.org/java/3_0_3/api/contrib-queryparser/org/apache/lucene/queryParser/core/package-summary.html

On Fri, Mar 25, 2011 at 4:21 AM, Simon 
Willnauer simon.willna...@googlemail.com wrote:
Hey there,

welcome to Lucene :), good to hear you are interested in Lucene and GSoC!

On Fri, Mar 25, 2011 at 4:49 AM, Vinicius Paes de barros
viniciuspaesdebar...@yahoo.com.br wrote:

 Hi there,
 I heard about GSOC from a friend of mine at college and I decide I want to 
 participate this year. I already used Lucene before, so Lucene sounds like a 
 good place to start.
 I went through the JIRA projects, but I couldn't find something I feel like 
 writing a proposal to, maybe I don't have enough knowledge yet about how 
 Lucene is implemented internally. So I started looking at the wiki, but I'm 
 not sure whether it contains all the info I need.
 Is there any other place I should be looking at to learn more about Lucene's 
 internal design?
We don't have a lot of design documents and if there are any they
might be most likely outdated. I think the best documentation is the
code and the people who have written it. If you wanna dive into lucene
you should ask as many questions you need to ask and get all the info
out of us. We are usually around every day depending on the timezones
though so you either go and write emails or you join our IRC channel
#lucene on freenode (http://lucene.apache.org/java/docs/irc.html)
Is there anything particular that you are interested in like indexing,
search, analysis etc?

simon

 Thanks in advance,
 Vinicius Barros


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2995) factor out a shared spellchecking module

factor out a shared spellchecking module


 Key: LUCENE-2995
 URL: https://issues.apache.org/jira/browse/LUCENE-2995
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0


In lucene's contrib we have spellchecking support (index-based spellchecker, 
directspellchecker, etc). 
we also have some things like pluggable comparators.

In solr we have auto-suggest support (with two implementations it looks like), 
some good utilities like HighFrequencyDictionary, etc.

I think spellchecking is really important... google has upped the ante to what 
users expect.
So I propose we combine all this stuff into a shared modules/spellchecker, 
which will make it easier
to refactor and improve the quality.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2995) factor out a shared spellchecking module


 [ 
https://issues.apache.org/jira/browse/LUCENE-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2995:


Attachment: LUCENE-2995.patch

Just a quick shot at this (all tests pass).

Really any serious 'refactoring' e.g. perf improvements should be on followup 
issues I think.

before applying the patch, run this:
{noformat}
svn move lucene/contrib/spellchecker modules
svn move solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java 
modules/spellchecker/src/java/org/apache/lucene/search/spell
svn move solr/src/java/org/apache/solr/util/TermFreqIterator.java 
modules/spellchecker/src/java/org/apache/lucene/search/spell
svn move solr/src/java/org/apache/solr/util/SortedIterator.java 
modules/spellchecker/src/java/org/apache/lucene/search/spell
svn move solr/src/java/org/apache/solr/spelling/suggest/Suggester.java 
solr/src/java/org/apache/solr/spelling
svn move solr/src/java/org/apache/solr/spelling/suggest 
modules/spellchecker/src/java/org/apache/lucene/search/spell
{noformat}

 factor out a shared spellchecking module
 

 Key: LUCENE-2995
 URL: https://issues.apache.org/jira/browse/LUCENE-2995
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2995.patch


 In lucene's contrib we have spellchecking support (index-based spellchecker, 
 directspellchecker, etc). 
 we also have some things like pluggable comparators.
 In solr we have auto-suggest support (with two implementations it looks 
 like), some good utilities like HighFrequencyDictionary, etc.
 I think spellchecking is really important... google has upped the ante to 
 what users expect.
 So I propose we combine all this stuff into a shared modules/spellchecker, 
 which will make it easier
 to refactor and improve the quality.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2382) DIH Cache Improvements

2011-03-26 Thread Lance Norskog (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011700#comment-13011700
]

Lance Norskog commented on SOLR-2382:
-

Have you tested this under threading?

DIH Cache Improvements
--

Key: SOLR-2382
URL: https://issues.apache.org/jira/browse/SOLR-2382
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: James Dyer
Priority: Minor
Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch,
SOLR-2382.patch

Functionality:
1. Provide a pluggable caching framework for DIH so that users can choose a
cache implementation that best suits their data and application.

2. Provide a means to temporarily cache a child Entity's data without
needing to create a special cached implementation of the Entity Processor
(such as CachedSqlEntityProcessor).

3. Provide a means to write the final (root entity) DIH output to a cache
rather than to Solr. Then provide a way for a subsequent DIH call to use the
cache as an Entity input. Also provide the ability to do delta updates on
such persistent caches.

4. Provide the ability to partition data across multiple caches that can
then be fed back into DIH and indexed either to varying Solr Shards, or to
the same Core in parallel.
Use Cases:
1. We needed a flexible scalable way to temporarily cache child-entity
data prior to joining to parent entities.
- Using SqlEntityProcessor with Child Entities can cause an n+1 select
problem.
- CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching
mechanism and does not scale.
- There is no way to cache non-SQL inputs (ex: flat files, xml, etc).

2. We needed the ability to gather data from long-running entities by a
process that runs separate from our main indexing process.

3. We wanted the ability to do a delta import of only the entities that
changed.
- Lucene/Solr requires entire documents to be re-indexed, even if only a
few fields changed.
- Our data comes from 50+ complex sql queries and/or flat files.
- We do not want to incur overhead re-gathering all of this data if only 1
entity's data changed.
- Persistent DIH caches solve this problem.

4. We want the ability to index several documents in parallel (using 1.4.1,
which did not have the threads parameter).

5. In the future, we may need to use Shards, creating a need to easily
partition our source data into Shards.
Implementation Details:
1. De-couple EntityProcessorBase from caching.
- Created a new interface, DIHCache two implementations:
- SortedMapBackedCache - An in-memory cache, used as default with
CachedSqlEntityProcessor (now deprecated).
- BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested
with je-4.1.6.jar
- NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar.
I believe this may be incompatible due to Generic Usage.
- NOTE: I did not modify the ant script to automatically get this jar,
so to use or evaluate this patch, download bdb-je from
http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html

2. Allow Entity Processors to take a cacheImpl parameter to cause the
entity data to be cached (see EntityProcessorBase DIHCacheProperties).

3. Partially De-couple SolrWriter from DocBuilder
- Created a new interface DIHWriter, two implementations:
- SolrWriter (refactored)
- DIHCacheWriter (allows DIH to write ultimately to a Cache).

4. Create a new Entity Processor, DIHCacheProcessor, which reads a
persistent Cache as DIH Entity Input.

5. Support a partition parameter with both DIHCacheWriter and
DIHCacheProcessor to allow for easy partitioning of source entity data.

6. Change the semantics of entity.destroy()
- Previously, it was being called on each iteration of
DocBuilder.buildDocument().
- Now it is does one-time cleanup tasks (like closing or deleting a
disk-backed cache) once the entity processor is completed.
- The only out-of-the-box entity processor that previously implemented
destroy() was LineEntitiyProcessor, so this is not a very invasive change.
General Notes:
We are near completion in converting our search functionality from a legacy
search engine to Solr. However, I found that DIH did not support caching to
the level of our prior product's data import utility. In order to get our
data into Solr, I created these caching enhancements. Because I believe this
has broad application, and because we would like this feature to be supported
by the Community, I have front-ported this, enhanced, to Trunk. I have also
added unit tests and verified that all existing test cases pass. I believe
this

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Chris Male

Hi,

It really should say: Added Geospatial Support, as it was non-existent in
 Solr before.

 Most of the work for adding in spatial in Solr consisted of improving
 things in Solr to make it easy to leverage the one spatial feature we really
 added: distance based functions and parsing support.  Everything else was
 generally useful things: sorting by function, poly fields, etc.  I started
 on tier support, but dropped it when I realized it was broken beyond repair.
  The Solr stuff uses, IMO, the stuff in Lucene that works and ignores the
 rest.  I seem to recall Chris had said that once I got done w/ the Solr
 stuff he would do the modules work, but it hasn't happened yet.


 I'd say in 3.2, since it sounds like Chris did at least deprecate
 contrib/spatial, that we work to get all of this resolved:  spatial -
 modules, function queries - modules.  Naturally we should do it on trunk,
 too.


Just note that I didn't not do it out of laziness.  Actually pushing stuff
into the module isn't easy since there isn't much that can be saved from
contrib, and Solr's spatial code are predominately bound to function
queries, which themselves are very coupled to Solr and that there wasn't
anything like a consensus that they should be moved.


-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Chris Male

On Sun, Mar 27, 2011 at 7:30 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Sat, Mar 26, 2011 at 2:17 PM, Ryan McKinley ryan...@gmail.com wrote:
 
  No, the question is: what justification is there for adding spatial
  support to solr-only, leaving lucene with a broken contrib module,
  versus adding it where it belongs and exposing it to solr?
 
  There need not be any linkage to lucene to improve a Solr feature.
  If you disagree, we should vote to clarify - this is too important
  (and too much of a negative for Solr).
 
 
  I don't think there is *requirement* to move the core spatial stuff to
  lucene, but I think there is huge benefit to both communities if
  things have as few dependencies as possible.  To be frank, the spatial
  support in solr is pretty hairy -- it works for some use cases, but is
  not extendable and quite basic.  Calling it 'distance' seems more
  appropriate then 'spatial'


 Having something basic that works (and has a clean enough high level
 HTTP interface) was clearly a win for Solr users.

 Of course a more fully featured spatial module would be a win for
 everyone, but that's ignoring the more generic issue at hand here:
 a patch that improves Solr's spatial
 should not be blocked on the grounds that it does not improve Lucene's
 spatial enough.


I don't think we need to see it that way, we want to improve both Solr and
Lucene's spatial support, not block either.  As you say, having a module is
a win for everyone, Solr and Lucene alike, so it seems obvious that we
should go down that path and the code in SOLR-2155 would make a great first
addition.



 Likewise, the ridiculous notion that Queries don't belong in Solr
 needs to be put to rest.


Issues in and around this seem to be coming up a lot these days (I'm
thinking FunctionQuerys too).  Sounds like something that really does need
to be openly discussed.



 -Yonik

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl

[jira] [Commented] (LUCENE-2995) factor out a shared spellchecking module

2011-03-26 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011711#comment-13011711
 ] 

Chris Male commented on LUCENE-2995:


+1

 factor out a shared spellchecking module
 

 Key: LUCENE-2995
 URL: https://issues.apache.org/jira/browse/LUCENE-2995
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2995.patch


 In lucene's contrib we have spellchecking support (index-based spellchecker, 
 directspellchecker, etc). 
 we also have some things like pluggable comparators.
 In solr we have auto-suggest support (with two implementations it looks 
 like), some good utilities like HighFrequencyDictionary, etc.
 I think spellchecking is really important... google has upped the ante to 
 what users expect.
 So I propose we combine all this stuff into a shared modules/spellchecker, 
 which will make it easier
 to refactor and improve the quality.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[VOTE] Lucene 3.1.0 RC3

Artifacts are at http://people.apache.org/~gsingers/staging_area/rc3/.  Please 
vote as you see appropriate.  Vote closes on March 29th.

I've also updated the Release To Do for both Lucene and Solr and it is 
hopefully a lot easier now to produce the artifacts as more of it is automated 
(including uploading to staging area).


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

3.1.0 Proposed Release Announcement(s)

Proposed Release Announcement (edits welcome).  Also note we can have ASF 
Marketing put out a press release if we want.

snip
March 2011, Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1 and 
Apache Solr 3.1. 

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/java
and http://www.apache.org/dyn/closer.cgi/lucene/java.  See the respective 
CHANGES.txt
file included with the release for a full list of details.

Lucene 3.1 Release Highlights
* Improved Unicode support, including Unicode 4

* ReusableAnalyzerBase make it easier to reuse TokenStreams correctly

* Protected words in stemming via KeywordAttribute

* ConstantScoreQuery now allows directly wrapping a Query

* Support for custom ExecutorService in ParallelMultiSearcher

* IndexWriterConfig.setMaxThreadStates for controls of IndexWriter threads

* Numerous performance improvements: faster exact PhraseQuery;
  natural segment merging favors segments with deletions; primary
  key lookup is faster; IndexWriter.addIndexes(Directory[]) uses
  file copy instead of merging; BufferedIndexInput does fewer bounds
  checks; compound file is dynamically turned off for large
  segments; fully deleted segments are dropped on commit; faster
  snowball analyzers (in contrib); ConcurrentMergeScheduler is more
  careful about setting priority of merge threads.

* IndexWriter is now configured with a new separate builder API
  (IndexWriterConfig).

* IndexWriter.getReader is replaced by
  IndexReader.open(IndexWriter).  In addition you can now specify
  whether deletes should be resolved when you open an NRT reader.

* MultiSearcher is deprecated; ParallelMultiSearcher has been
  absorbed directly into IndexSearcher

* CharTermAttribute replaces TermAttribute in the Analysis process

* On 64bit Windows and Solaris JVMs, MMapDirectory is now the
  default implementation (returned by FSDirectory.open).
  MMapDirectory also enables unmapping if the JVM supports it.

* New TotalHitCountCollector just counts total number of hits

* ReaderFinishedListener API enables external caches to evict
  entries once a segment is finished

Solr 3.1 Release Highlights

* Added spatial filtering, boosting and sorting capabilities

* Added extend dismax (edismax) query parser which addresses some missing
features in the dismax query parser along with some extensions

* Several more components now support distributed mode: TermsComponent, 
SpellCheckComponent

* Added an Auto Suggest component 

* Ability to sort by functions

* Support for adding documents using JSON format

* Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well 
as new analysis capabilities

* Numerous bug fixes and optimizations.

/snip
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes