[jira] Updated: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-21 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated LUCENE-2649:
--

Attachment: LUCENE-2649-FieldCacheWithBitSet.patch

Here is a (hopefully) final patch that adds a bunch of tests to exercise the 
the 'valid' bits (and check that MatchAll is used when appropriate)

> FieldCache should include a BitSet for matching docs
> 
>
> Key: LUCENE-2649
> URL: https://issues.apache.org/jira/browse/LUCENE-2649
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  
> However there is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a 
> BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-21 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2662:
-

Attachment: LUCENE-2662.patch

We need unit tests and a base implementation as BytesHash is abstract...

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

2010-09-21 Thread Tommaso Teofili (JIRA)
Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
---

 Key: SOLR-2129
 URL: https://issues.apache.org/jira/browse/SOLR-2129
 Project: Solr
  Issue Type: New Feature
Reporter: Tommaso Teofili


Provide components to enable Apache UIMA automatic metadata extraction to be 
exploited when indexing documents.
The purpose of this is to get unstructured information "inside" a document and 
create structured metadata (as fields) to enrich each document.

Basically this can be done with a custom UpdateRequestProcessor which triggers 
UIMA while indexing documents.
The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
(with a tokenizer and an hidden Markov model tagger), named entities, language, 
suggested category, keywords and concepts (exploiting external services from 
OpenCalais and AlchemyAPI). Such an implementation can be easily extended 
adding or selecting different UIMA analysis engines, both from UIMA 
repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2630) make the build more friendly to apache harmony

2010-09-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2630.
-

 Assignee: Robert Muir
Fix Version/s: 3.1
   4.0
   Resolution: Fixed

happy to mark this one resolved: harmony devs have quickly fixed issues we 
found.

with r999725 of harmony, all core, contrib, modules tests pass for both trunk 
and 3x on harmony (with the exception of contrib/remote due to rmic problems)


> make the build more friendly to apache harmony
> --
>
> Key: LUCENE-2630
> URL: https://issues.apache.org/jira/browse/LUCENE-2630
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Build, Tests
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, 
> LUCENE-2630_charutils.patch, LUCENE-2630_intl.patch
>
>
> as part of improved testing, i thought it would be a good idea to make the 
> build (ant test) more friendly
> to working under apache harmony.
> i'm not suggesting we de-optimize code for sun jvms or anything crazy like 
> that, only use it as a tool.
> for example:
> * bugs in tests/code: for example i found a test that expected ArrayIOOBE 
>   when really the javadoc contract for the method is just IOOBE... it just 
> happens to
>   pass always on sun jvm because thats the implementation it always throws.
> * better reproduction of bugs: for example [2 months out of the 
> year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug]
>   it seems TestQueryParser fails with thai locale in a difficult-to-reproduce 
> way.
>   but i *always* get similar failures like this with harmony for this test 
> class.
> * better stability and portability: we should try (if reasonable) to avoid 
> depending
>   upon internal details. the same kinds of things that fail in harmony might 
> suddenly
>   fail in a future sun jdk. because its such a different impl, it brings out 
> a lot of interesting stuff.
> at the moment there are currently a lot of failures, I think a lot might be 
> caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913403#action_12913403
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

A further question for this issue, in regards to copy-on-write
of the 1st dimension of the byte[][] array, will we want to keep
a count of references to the byte array, in the case of, lets
say multiple readers keeping references to each individual byte
array (the one with the bytes data). Assuming we will want to
continue to pool the byte[]s, I think we'll need to use
reference counting, or simply not pool the byte[]s after
flushing, in order to avoid overwriting of arrays.

> Concurrent byte and int block implementations
> -
>
> Key: LUCENE-2575
> URL: https://issues.apache.org/jira/browse/LUCENE-2575
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2662) BytesHash

2010-09-21 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2662:
-

Priority: Minor  (was: Major)

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Realtime Branch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2662) BytesHash

2010-09-21 Thread Jason Rutherglen (JIRA)
BytesHash
-

 Key: LUCENE-2662
 URL: https://issues.apache.org/jira/browse/LUCENE-2662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch


This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2558) Use sequence ids for deleted docs

2010-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913400#action_12913400
 ] 

Jason Rutherglen commented on LUCENE-2558:
--

For the deleted docs sequence id array, perhaps I'm a little bit
confused, but how will we signify in the sequence id array if a
document is deleted? I believe we need a secondary sequence id
array for deleted docs that is init'd to -1. When a document is
deleted, the sequence id is set for that doc in the
del-docs-seq-arr. When the deleted docs Bits is being accessed,
for a given doc, we'll compare the IRs seq-id-up-to with the
del-docs-seq-id, and if the IR seq-id is greater than or equal
to, the Bits.get method will return true, meaning the document
is deleted. 

I am forgetting how concurrency will work in this case, ie,
insuring multi-threaded visibility due to the JMM. Actually,
because we're pausing the writes/deletes when get reader is
called on the DWPT, JMM concurrency should be OK.

> Use sequence ids for deleted docs
> -
>
> Key: LUCENE-2558
> URL: https://issues.apache.org/jira/browse/LUCENE-2558
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: Realtime Branch
>
>
> Utilizing the sequence ids created via the update document
> methods, we will enable IndexReader deleted docs over a sequence
> id array. 
> One of the decisions is what primitive type to use. We can start
> off with an int[], then possibly move to a short[] (for lower
> memory consumption) that wraps around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2128) full parameter dereferencing for function queries

2010-09-21 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2128:
---

Attachment: SOLR-2128.patch

Here's the patch that implements this feature via nested queries.
As an example, you can now do stuff like this:

http://localhost:8983/solr/select?defType=func&fl=id,score&q=add($v1,$v2)&v1=mul(2,3)&v2=10

> full parameter dereferencing for function queries
> -
>
> Key: SOLR-2128
> URL: https://issues.apache.org/jira/browse/SOLR-2128
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-2128.patch
>
>
> We should be able to specify function parameters as $foo (where foo is 
> another request parameter).
> Ideally the parameter could itself be a full function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913393#action_12913393
 ] 

Robert Muir commented on LUCENE-2657:
-

bq. I stand corrected. Build simplification is certainly a goal I support; 
without the flexibility afforded by a coherent build system, issues like this 
one would definitely be more difficult to implement.

Exactly, because if you were to write a patch for this issue today, you have to 
implement it in 4 or 5 different places at the moment (or at least employ dirty 
ant hacks)

The same problem prevents us from really improving our build in other ways: for 
example we should probably support findbugs/pmd integration and things like 
that, and it should be simple.


> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-09-21 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2312:
-

Affects Version/s: Realtime Branch
   (was: 3.0.1)

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: Realtime Branch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913391#action_12913391
 ] 

Steven Rowe commented on LUCENE-2657:
-

{quote}
bq. It is not the build process that requires simplification (is anybody really 
calling for that?), but rather developers' interaction with it?

I am trying to simplify the build, with the prototype in SOLR-2002, whereas 
most things are in a single common-build and we invoke them recursively (things 
like rat, dist, test, javadocs).
{quote}

I stand corrected.  Build simplification is certainly a goal I support; without 
the flexibility afforded by a coherent build system, issues like this one would 
definitely be more difficult to implement.

> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2567) RT Terms Dictionary

2010-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913385#action_12913385
 ] 

Jason Rutherglen edited comment on LUCENE-2567 at 9/22/10 12:34 AM:


The RT terms dict has been introduced in the LUCENE-2575 patches.  I may end up 
closing this issue, or if needed moving the terms dict code from LUCENE-2575.

  was (Author: jasonrutherglen):
The RT terms dict has been introduced in the LUCENE-2575 patches.  I may 
end up closing this issue, or if needed moving the terms dict code from 
LUCENE-2575 if needed.
  
> RT Terms Dictionary
> ---
>
> Key: LUCENE-2567
> URL: https://issues.apache.org/jira/browse/LUCENE-2567
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
>
> Implement an in RAM terms dictionary for realtime search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2010-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913389#action_12913389
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

The patches I've been submitting to LUCENE-2575 probably should go here.  Once 
the new byte block pool that records the slice level at the beginning of the 
slice is finished, the skip list can be completed, and then the basic 
functionality for searching on the RAM buffer will be done.  At that point the 
concurrency and memory efficiency may be focused on and tested.  In addition 
the deletes must be implemented.

> Search on IndexWriter's RAM Buffer
> --
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.1
>Reporter: Jason Rutherglen
>Assignee: Michael Busch
> Fix For: Realtime Branch
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913387#action_12913387
 ] 

Robert Muir commented on LUCENE-2657:
-

bq. It is not the build process that requires simplification (is anybody really 
calling for that?), but rather developers' interaction with it?

I am trying to simplify the build, with the prototype in SOLR-2002, whereas 
most things are in a single common-build and we invoke them recursively (things 
like rat, dist, test, javadocs).

Right now it is complicated mostly because there are actually many build setups 
that are essentially standalone, and then have been linked to depend on each 
other.

> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913386#action_12913386
 ] 

Steven Rowe commented on LUCENE-2657:
-

I don't see this issue as providing a complete Ant->Maven conversion system.  
The poms I envision will look just like the POM templates after they are 
interpolated - they will provide Maven coordinates for artifacts and their 
dependencies, nothing more.  And the process to produce them can be carried out 
as part of generating Maven artifacts.  I don't think they will need to be 
checked in, but they certainly could be.

In other words, I don't plan on building a complete Maven build configuration 
here.  Just the bare-bones POMs required to upload artifacts, like the manually 
synched system we have now.   Only without the manual synching part. 

Vive la simplification!

> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2567) RT Terms Dictionary

2010-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913385#action_12913385
 ] 

Jason Rutherglen commented on LUCENE-2567:
--

The RT terms dict has been introduced in the LUCENE-2575 patches.  I may end up 
closing this issue, or if needed moving the terms dict code from LUCENE-2575 if 
needed.

> RT Terms Dictionary
> ---
>
> Key: LUCENE-2567
> URL: https://issues.apache.org/jira/browse/LUCENE-2567
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
>
> Implement an in RAM terms dictionary for realtime search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-21 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913383#action_12913383
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

This issue is blocked because the change made to ByteBlockPool to add the level 
of the slice, to the beginning of the slice, moves all of the positions forward 
by one.  This has caused TestByteSlices to fail an assertion.  I'm not sure if 
the test needs to be changed, or there's a bug in the new BBP implementation.  
Either way it's a bit of a challenge to debug.

> Concurrent byte and int block implementations
> -
>
> Key: LUCENE-2575
> URL: https://issues.apache.org/jira/browse/LUCENE-2575
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913379#action_12913379
 ] 

Steven Rowe commented on LUCENE-2657:
-

bq. My concern here is this process is going in the opposite direction by 
making it vastly more complicated. What your proposing sounds like a great app 
independent of Lucene - a way to convert from an ant build system to a maven 
one, but I'm unsure its something that should be part of a build process 
instead of a once off thing. Certainly when we are trying to simplify said 
build process.

It is not the build process that requires simplification (is anybody really 
calling for that?), but rather developers' interaction with it?

> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Marvin Humphrey

> Bill Bell edited comment on SOLR-2125 at 9/21/10 11:47 PM:

Bill, can you please refrain from editing your JIRA comments repeatedly?  Each
time you make an edit, it generates an email to the d...@lucene.a.o list, so
we're getting an an avalanche of email here.

Jeez, I wish it were possible to turn off JIRA editing.

Marvin Humphrey


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913373#action_12913373
 ] 

Chris Male commented on LUCENE-2657:


bq. If nothing else, it is abundantly clear that there are more than a couple 
of Lucene/Solr developers that don't want to put in any effort for Maven's sake.

If developers are willing to add the dependency jar and lines to the ant file 
but not add the 5 lines to the pom, then I'm not sure any of these attempts are 
really going to succeed.  Sure many developers don't like maven, but I haven't 
seen any straight out refuse to use it yet (unless I misread the large email 
thread), they are just demanding ease of use.

My concern here is this process is going in the opposite direction by making it 
vastly more complicated.  What your proposing sounds like a great app 
independent of Lucene - a way to convert from an ant build system to a maven 
one, but I'm unsure its something that should be part of a build process 
instead of a once off thing.  Certainly when we are trying to simplify said 
build process.

> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913368#action_12913368
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 11:47 PM:
---

Fixed the distance calc for box. We still need to take the results and do a 
hsin()

{code}

this now returns 8 on example data:

http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=196&sort=hsin%286371,true,store,vector%2844.9369054,-91.3929348%29%29%20asc

Note: 196 km is right

{code}


  was (Author: billnbell):
Fixed the distance calc for box. We still need to take the results and do a 
hsin()
  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913368#action_12913368
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 11:43 PM:
---

Fixed the distance calc for box. We still need to take the results and do a 
hsin()

  was (Author: billnbell):
Fixed old version
  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2125:


Comment: was deleted

(was: svn diff of changes to support better distance at 45 and 225 degrees

I am reuploading a new copy.)

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2125:


Attachment: (was: Distance.diff)

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913365#action_12913365
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 11:41 PM:
---

svn diff of changes to support better distance at 45 and 225 degrees

I am reuploading a new copy.

  was (Author: billnbell):
svn diff of changes to support better distance at 45 and 225 degrees
  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2125:


Attachment: Distance.diff

Fixed old version

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2125:


Attachment: Distance.diff

svn diff of changes to support better distance at 45 and 225 degrees

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: Distance.diff, solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913363#action_12913363
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 10:56 PM:
---

There is another example...

http://www.movable-type.co.uk/scripts/latlong-db.html not sure - but this is 
also a good first cut. Then if you want more accurate results calculate results 
that come back
 and through out ones that are not within. Cache results. The link above 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates appears a little 
more accurate. But this is simpler.

{code}

// first-cut bounding box (in degrees)
$maxLat = $lat + rad2deg($rad/$R);
$minLat = $lat - rad2deg($rad/$R);
// compensate for degrees longitude getting smaller with increasing latitude
$maxLon = $lon + rad2deg($rad/$R/cos(deg2rad($lat)));
$minLon = $lon - rad2deg($rad/$R/cos(deg2rad($lat)));
 
$sql = "Select ID, Postcode, Lat, Lon
From MyTable
Where Lat > $minLat And Lat < $maxLat
  And Lon > $minLon And Lon < $maxLon";
{code}



  was (Author: billnbell):
There is another example...

http://www.movable-type.co.uk/scripts/latlong-db.html not sure - but this is 
also a good first cut. Then if you want more accurate results calculate results 
that come back
 and through out ones that are not within. Cache results. The link above 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates appears a little 
more accurate. But this is simpler.

{code}

// first-cut bounding box (in degrees)
$maxLat = $lat + rad2deg($rad/$R);
$minLat = $lat - rad2deg($rad/$R);
// compensate for degrees longitude getting smaller with increasing latitude
$maxLon = $lon + rad2deg($rad/$R/cos(deg2rad($lat)));
$minLon = $lon - rad2deg($rad/$R/cos(deg2rad($lat)));
 
$sql = "Select ID, Postcode, Lat, Lon
From MyTable
Where Lat>$minLat And Lat<$maxLat
  And Lon>$minLon And Lon<$maxLon";
{code}


  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913363#action_12913363
 ] 

Bill Bell commented on SOLR-2125:
-

There is another example...

http://www.movable-type.co.uk/scripts/latlong-db.html not sure - but this is 
also a good first cut. Then if you want more accurate results calculate results 
that come back
 and through out ones that are not within. Cache results. The link above 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates appears a little 
more accurate. But this is simpler.

{code}

// first-cut bounding box (in degrees)
$maxLat = $lat + rad2deg($rad/$R);
$minLat = $lat - rad2deg($rad/$R);
// compensate for degrees longitude getting smaller with increasing latitude
$maxLon = $lon + rad2deg($rad/$R/cos(deg2rad($lat)));
$minLon = $lon - rad2deg($rad/$R/cos(deg2rad($lat)));
 
$sql = "Select ID, Postcode, Lat, Lon
From MyTable
Where Lat>$minLat And Lat<$maxLat
  And Lon>$minLon And Lon<$maxLon";
{code}



> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913361#action_12913361
 ] 

Steven Rowe commented on LUCENE-2657:
-

bq. Assuming we have well-structured poms in the first place (huge assumption, 
different issue), are developers unprepared to add the appropriate dependency 
to the necessary pom? 

If nothing else, it is abundantly clear that there are more than a couple of 
Lucene/Solr developers that don't want to put in *any* effort for Maven's sake.

bq. Also, I'm not sure all jars are going to have the appropriate manifest. If 
the pom is generated (like Artifactory can do btw), I'm not sure it is packaged 
in the jar?

I've looked into 10-15 jars, and very few have the required info.  Some even 
have the wrong information in the manifest...

I think we can use .jar digest (SHA-1) as identifiers to look up the 
appropriate Maven info to automatically set up POM dependencies.  I wrote a 
script to crawl the central Maven repo for SHA-1 keyed Maven coordinates 
(groupId:artifactId:version), and it seems to work, but at the rate it's 
progressing, it will take 4 days to finish.  Anyway, it seems like overkill to 
gather info on all of the ~250k artifacts in the Maven central repo, just to 
handle the ~80 .jar files in the Lucene/Solr source tree.  

I found a free service that provides Maven central repo search, including by 
artifact digest: http://www.jarvana.com/jarvana/search - it uses Lucene to 
power the search :).  I plan on using this as the basis for automatic Maven 
coordinate lookup for arbitrary .jars.


> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913353#action_12913353
 ] 

Uwe Schindler commented on LUCENE-2649:
---

Also set the Lucene default to true, as I want to improve sorting and FCRF.

> FieldCache should include a BitSet for matching docs
> 
>
> Key: LUCENE-2649
> URL: https://issues.apache.org/jira/browse/LUCENE-2649
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  
> However there is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a 
> BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Auto-generate POM templates from Ant builds

2010-09-21 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913350#action_12913350
 ] 

Chris Male commented on LUCENE-2657:


I'm wondering whether generating poms is just going to make them more 
mysterious.  While they are templates I can at least see them and they can be 
validated by XML validators (including IDEs).  Suddenly they are created 
through a more complex process which we also have to maintain.  

Assuming we have well-structured poms in the first place (huge assumption, 
different issue), are developers unprepared to add the appropriate dependency 
to the necessary pom? It doesn't seem like it'll happen very often, most of our 
artifacts have very few dependencies and we resist increasing them.  We can 
document the process, make it very clear exactly what needs to be added and 
where.

Also, I'm not sure all jars are going to have the appropriate manifest.  If the 
pom is generated (like Artifactory can do btw), I'm not sure it is packaged in 
the jar?

I think Robert was onto the right idea with this original proposal: If we 
consider our modules section and if we add a pom at the modules level and then 
one for each module, running a build from the modules level will recursively 
build each module.  If there is something wrong with the poms themselves, maven 
will fail.  If a dependency is omitted meaning the code cannot be built, maven 
will fail.  If any of these 2 issues occur, then the build would fail and an 
issue can be created for those more comfortable with maven to fix.

> Auto-generate POM templates from Ant builds
> ---
>
> Key: LUCENE-2657
> URL: https://issues.apache.org/jira/browse/LUCENE-2657
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> Lucene and Solr modules' POM templates are manually maintained, and so are 
> not always in sync with the dependencies used by the Ant build. 
> It should be possible to auto-generate POM templates using build tools 
> extending Ant's 
> [BuildListener|http://api.dpml.net/ant/1.6.5/org/apache/tools/ant/BuildListener.html]
>  interface, similarly to how the 
> [ant2ide|http://gleamynode.net/articles/2234/] project generates eclipse 
> project files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-21 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913347#action_12913347
 ] 

Ryan McKinley commented on LUCENE-2649:
---

Any thoughts on this?

I think the best move forward is to:
a. optimize as much as possible
b. drop the no-parser function option
c. optionally store the bitset via static config (ugly, but lesser of many ugly 
options)
d. set lucene default=false (actually I don't care)
e. set solr default=true

Unless there are objections, I will clean up the patch, fix javadoc, tests, etc

> FieldCache should include a BitSet for matching docs
> 
>
> Key: LUCENE-2649
> URL: https://issues.apache.org/jira/browse/LUCENE-2649
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  
> However there is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a 
> BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2128) full parameter dereferencing for function queries

2010-09-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913344#action_12913344
 ] 

Yonik Seeley commented on SOLR-2128:


Some of the motivation for this is to make spatial search easier by enabling 
users to reference other parameters in the request such as the point or 
distance.

> full parameter dereferencing for function queries
> -
>
> Key: SOLR-2128
> URL: https://issues.apache.org/jira/browse/SOLR-2128
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> We should be able to specify function parameters as $foo (where foo is 
> another request parameter).
> Ideally the parameter could itself be a full function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2128) full parameter dereferencing for function queries

2010-09-21 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-2128:
--

Assignee: Yonik Seeley

> full parameter dereferencing for function queries
> -
>
> Key: SOLR-2128
> URL: https://issues.apache.org/jira/browse/SOLR-2128
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 4.0
>
>
> We should be able to specify function parameters as $foo (where foo is 
> another request parameter).
> Ideally the parameter could itself be a full function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2128) full parameter dereferencing for function queries

2010-09-21 Thread Yonik Seeley (JIRA)
full parameter dereferencing for function queries
-

 Key: SOLR-2128
 URL: https://issues.apache.org/jira/browse/SOLR-2128
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 4.0


We should be able to specify function parameters as $foo (where foo is another 
request parameter).
Ideally the parameter could itself be a full function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Document links

2010-09-21 Thread Ryan McKinley
> Maybe the user API would look something like this...
>
>    indexWriter.addLink(fromDocId, toDocId);
>    DocIdSet reader.getInboundLinks(docId);
>    DocIdSet reader.getOutboundLinks(docId);
>

To support 'typed' links, it would be nice to have:

indexWriter.addLink("link", fromDocId, toDocId);
DocIdSet reader.getInboundLinks(docId, "link");
DocIdSet reader.getOutboundLinks(docId, "link");

In my app, I build this graph that has many types of links that need
to be handled differently.

> Anyone else think this might be an area worth exploring?
>

for sure!

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913126#action_12913126
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 9:04 PM:
--

Grant: Can you give a good example using the same measurements for sorting and 
sfilt ?

Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch 
shows:

3.Using the "frange" QParser, as in fq={!frange l=0 u=400}hsin(0.57, -1.3, 
lat_rad, lon_rad, 3963.205) 

The Function Wiki Shows:

hsin(radius, true|false, x1,y1,x2,y2)

Here is the correct solution: (UPDATED)

{code}

http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=285&sort=hsin(6371,true,store,vector(44.9369054,-91.3929348))%20asc

{code}

The Local Solr from Patrick projected the ur, ll using the distance and did a 
range check to see if the points are in the box.It is a close approximation. 


  was (Author: billnbell):
Grant: Can you give a good example using the same measurements for sorting 
and sfilt ?

Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch 
shows:

3.Using the "frange" QParser, as in fq={!frange l=0 u=400}hsin(0.57, -1.3, 
lat_rad, lon_rad, 3963.205) 

The Function Wiki Shows:

hsin(radius, true|false, x1,y1,x2,y2)

Here is the correct solution: (UPDATED)

http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=285&sort=hsin%286371,true,store,vector%2844.9369054,-91.3929348%29%29%20asc

The Local Solr from Patrick projected the ur, ll using the distance and did a 
range check to see if the points are in the box.It is a close approximation. 

  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: discussion about release frequency.

2010-09-21 Thread Robert Muir
On Tue, Sep 21, 2010 at 7:40 PM, Chris Hostetter
wrote:

> For example: we probably shouldn't bother having a release if the only
> thing commited to that branch since the previous release are to fix some
> typoes in javadocs, or because new tests were added -- those changes are
> good, and worth having, but too much proliferation of minor versions for
> things that don't impact the users can be distracting and confusion, and
> makes it hard to recognize when a release is worth upgrading too (it's a
> girl who cried wolf thing).
>

i completely agree with you. I didn't mean to give the impression by "every
month or two" that we should actually have anything remotely resembling a
schedule driven by arbitrary dates. I meant here to suggest a very rough
idea of the sort of frequency that I think might actually work, and to bring
up the point that what we might consider minor features can be viewed by
users as major.

-- 
Robert Muir
rcm...@gmail.com


[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913126#action_12913126
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 9:02 PM:
--

Grant: Can you give a good example using the same measurements for sorting and 
sfilt ?

Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch 
shows:

3.Using the "frange" QParser, as in fq={!frange l=0 u=400}hsin(0.57, -1.3, 
lat_rad, lon_rad, 3963.205) 

The Function Wiki Shows:

hsin(radius, true|false, x1,y1,x2,y2)

Here is the correct solution: (UPDATED)

http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=285&sort=hsin%286371,true,store,vector%2844.9369054,-91.3929348%29%29%20asc

The Local Solr from Patrick projected the ur, ll using the distance and did a 
range check to see if the points are in the box.It is a close approximation. 


  was (Author: billnbell):
Grant: Can you give a good example using the same measurements for sorting 
and sfilt ?

Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch 
shows:

3.Using the "frange" QParser, as in fq={!frange l=0 u=400}hsin(0.57, -1.3, 
lat_rad, lon_rad, 3963.205) 

The Function Wiki Shows:

hsin(radius, true|false, x1,y1,x2,y2)

Would the correct solution be the following - can I use a vector?

http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=hsin(6371,true,store,vector(44.9369054,-91.3929348))
 asc 

The Local Solr from Patrick projected the ur, ll using the distance and did a 
range check to see if the points are in the box.It is a close approximation. An 
ellipses would be closer. Circle is not close enough.

  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] rename -dev.jar files to -SNAPSHOT.jar for better maven support

2010-09-21 Thread Ryan McKinley
>
> [x] Rename 3.x and 4.x to -SNAPSHOT.jar

Easier maven integration is worth the aesthetic change (though I think
-dev looks better!)

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[VOTE] rename -dev.jar files to -SNAPSHOT.jar for better maven support

2010-09-21 Thread Ryan McKinley
In an effort to have better maven support, I think it is worth making
our dev builds easily compatible with maven snapshot builds.  To get
maven to know a file is an unreleased snapshot, it must include
SNAPSHOT in the name currently the dev jars end with -dev.jar.

If we change the default name to -SNAPSHOT.jar, then the files created
with 'ant generate-maven-artifacts' can be easily pushed to the apache
snapshot repository.  The alternative is to run our build twice, once
generating the "-dev.jar" files and again using -Dversion=4.0-SNAPSHOT

Check: https://issues.apache.org/jira/browse/LUCENE-2493

[ ] Rename 3.x and 4.x to -SNAPSHOT.jar
[ ] Rename 4.x to -SNAPSHOT.jar
[ ] Keep -dev.jar, support maven by running the build twice, once with
-dev and once with -Dversion=4.0-SNAPSHOT

This is not a vote about maven support in general or about how it
should eventually fit into our release process.

Since this is a highly visible change, I think an official VOTE worthwhile.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene-java Wiki] Trivial Update of "ReleaseTodo" by YonikSeeley

2010-09-21 Thread Chris Hostetter

: correct me if I'm wrong) to presume that you are saying that it is 
: alright for the RM to decide what artifacts should be released.  So, if 
: that's not the case, then fine, I agree, but if it is, then no, I don't 
: think this is the right message to put on the page.  And it certainly 

I haven't seen anyone even remotely claim that.  the The RM most certainly 
gets to decide what *they* think should be released -- but it is the PMC 
that gets to decide what *will* be released, and the PMC decides by voting 
on it.

RM is just a label for someone who posts a file online somewhere, signs 
it, and calls a vote -- they don't have to be a commiter, they don't have 
to be a PMC member, they don't have to be building off of any specific 
branch, they don't have to organize their files in any particular way -- 
all that matters is that they say "here's some stuff, i think we should 
release it as Apache Foo 4.5.6.3.121_a" and it's up to the PMC to say 
"we're not ging to vote in favor of releasing that, it's just a zip file 
containing a foo.java file thta doesn't compile because it's just an 
ascii art picture of a monkey."

It's largely meaningless to try and argue that there should be a hard and 
fast formally voted on process for how we do a release, because a 
volunteer release manager can't be legally bound to follow that process -- 
they can just say "screw you guys, this is a pain in the ass i'm going 
home to do something fun with my time".  

It's likewise largely meaningless to try and argue that there should be a 
hard and fast formally voted on set of requirements for what must 
constitute a release candidate -- because no matter what the PMC might 
vote on as far as what those rules should be, they would be irrelevant 
once an actual release vote was called (ie: if PMC feels that all releases 
need to include a picture of a monkey, you don't need to vote on that as a 
formal rule - you just need to vote against any release that doesn't 
include a picture of a monkey)

Instead of spending a lot of time arguing over what type of formal process 
should be involved, and what is mandatory or not, and how to make 
mandatory things formally mandatory, etc  it would probably be more 
productive if folks who have strong opinions about what is important to 
produce as part of a release just focused on making it easier to do 
releases that produce those things.

For example...

Robert feels strongly that releases should always be well tested for many 
langauges/locales/jvms (+1), so he's been working his ass off to 
contribute patches that make sure we have an automated way to test these 
things so we don't have to think hard about them at release time. (++1)

If other people feel like releases should always have rock solid support 
for maven users to consume release artifacts that have accurate poms, then 
they should contribute patches that make sure we have an automated way to 
generate/publish those artifacts so we don't have to think hard about them 
at release time.

Likewise: If i feel like releases should always include a picture of a 
monkey holding the New York times on the day the release is made, then i 
should contribute a patch that causes the build to generate a picture of a 
monkey holding the new york times automaticly when the build is run, so we 
don't have to remember to do it at release time.

If any of the various goals conflict (ie: if grant doesn't want to vote in 
favor of a release because my picture of a monkey doesn't have a pom so 
it's not useable for maven users; or if robert doesn't want to vote in 
favor of a release that includes pom's because there are no tests 
verifying that those poms are usable) then let's argue about those 
specific, individual, seperate, points in a way that leads towards patches 
and automation and simplification of process -- Let's try to avoid arguing 
about formal rules and regulations that just lead to us having formal 
rules and regulations with no actual implementation or technical solution 
to make those rules/regulations a reality.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: discussion about release frequency.

2010-09-21 Thread Ryan McKinley
> I'm not buying the authoritative argument, it seems like any old joker can
> take our signed jars and put them in maven themselves, without us having to
> do any work.

The "standard" places, (apache / ibiblio / sonatype) require a project
representatives to post artifacts to the repository.  If the artifacts
are hosted at "hotsteamylucene.ru" yes, anyone could post anything.  I
hope we agree it is worth making it easy for people to use lucene via
maven, ivy, or whatever.  We need to make sure this is not a pain, and
maybe split out parts so that the RM does not necessarily need to do
everything.

My understanding is that you would feel OK about this if there was
some automated test that used the artifacts, and that the RM need not
know anything about maven internal -- just copy a directory to
somewhere in apache infrastructure, I don't think this is too far off.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Solr Wiki] Update of "FrontPage" by HossMan

2010-09-21 Thread Chris Hostetter

: The "FrontPage" page has been changed by HossMan.
: The comment on this change is: remove possible link spam - url may have been 
intended for PublicServers page ?.
: http://wiki.apache.org/solr/FrontPage?action=diff&rev1=232&rev2=233

This was largely a guess on my part, at best that link should be on 
"PublicServers" not the front page of hte solr wiki

: -  * [[http://www.juziku.com/|聚资库]]




-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: discussion about release frequency.

2010-09-21 Thread Chris Hostetter

: > to be seen wether we sould *need* to release that often -- it will
: > depend on wether anything new gets committed there -- but it would
...
: And as far as whether we *need* to release often, i'd like to start looking
: at whether we *should*.
: For a realistic example, Solr 3.x got autosuggest functionality. This isn't
: really a disruptive thing, its not going to introduce a lot of deprecations,
: etc.
: do we *need* to release because of that? no. *should* we release? I think
: yes.

Agreed, my point was merely that even if we get to the point where we are 
capable fo releasing off the 3x breanch every month (because we have the 
process/tools down pat) that doesn't mean we should.  we should release 
when we have features that we think are worth releasing, and are stable 
enough to support moving forward along that major branch.

For example: we probably shouldn't bother having a release if the only 
thing commited to that branch since the previous release are to fix some 
typoes in javadocs, or because new tests were added -- those changes are 
good, and worth having, but too much proliferation of minor versions for 
things that don't impact the users can be distracting and confusion, and 
makes it hard to recognize when a release is worth upgrading too (it's a 
girl who cried wolf thing).

The other situation to ocnsider is when a feature is commited which works 
correctly, and has good tests and documentation, but people have 
questions/reservations about wether the API is really what we want -- just 
because the calender says it's time for the quarterly release, and the 
code works, doesn't mean we should just blindly release.  (and yes, i 
realize that for the 3x branch, these API decisions should probably be 
hashed out on trunk before the feature is backported to 3x ... it's not 
hte best example)

In any case, i'm really just trying to bring things back to the point i 
was getting at in my last mail: the decision to release should be made 
based on state of the code, not the calender; but it would be awesome if 
hte process was automated enough that we could release as often as we 
thought the state of the code warranted it.

: I think we should try to avoid massive yearly releases with a ton of changes
: at once.

No argument there ... in the past i think this was really a combination of 
(a) "concerns about API/back-compat" and (b) "the release process is such 
a nightmare let's get a few more features in before we release so we don't 
have to do it again too soon" ... the 3x branch makes (a) a smaller 
concern, if we can make (b) go away we can pretty much release whenever we 
want.


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913244#action_12913244
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 6:44 PM:
--

OK so that makes sense. Youa re using distance at 45 degrees. So the east-west 
would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.

UPDATE - NOTE:

Plugging all this into the web site, proves that Pythagorean is a good 
approximation... 

See Excel attached.

hsin = 309 km from pt to max
hsin = 314 km from pt to min
Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)

41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2)

Yonik: sqrt(2) is right - but the spreadsheet is a bit better based on spheres.

The #2 will then subselect the points to limit within that result set.

Therefore, a user could take a distance from the user, sqrt(d^2+d^2) and use 
that to get a list - it is not exact but better than nothing.


  was (Author: billnbell):
OK so that makes sense. Youa re using distance at 45 degrees. So the 
east-west would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.

UPDATE - NOTE:

Plugging all this into the web site, proves that Pythagorean is a good 
approximation... 

See Excel attached.

hsin = 309 km from pt to max
hsin = 314 km from pt to min
Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)

41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2)

The #2 will then subselect the points to limit within that result set.

Therefore, a user could take a distance from the user, sqrt(d^2+d^2) and use 
that to get a list - it is not exact but better than nothing.

  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this

[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913244#action_12913244
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 6:39 PM:
--

OK so that makes sense. Youa re using distance at 45 degrees. So the east-west 
would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.

UPDATE - NOTE:

Plugging all this into the web site, proves that Pythagorean is a good 
approximation... 

See Excel attached.

hsin = 309 km from pt to max
hsin = 314 km from pt to min
Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)

41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2)

The #2 will then subselect the points to limit within that result set.

Therefore, a user could take a distance from the user, sqrt(d^2+d^2) and use 
that to get a list - it is not exact but better than nothing.


  was (Author: billnbell):
OK so that makes sense. Youa re using distance at 45 degrees. So the 
east-west would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.

UPDATE - NOTE:

Plugging all this into the web site, proves that Pythagorean is a good 
approximation... 

See Excel attached.

hsin = 309 km from pt to max
hsin = 314 km from pt to min
Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)

41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2)

The #2 will then subselect the points to limit within that result set.



  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@luc

[jira] Updated: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2125:


Attachment: solrspatial.xlsx

Example calcs to get lat/long for distance of 220km from 44.93691,-91.3929

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
> Attachments: solrspatial.xlsx
>
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913244#action_12913244
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 6:35 PM:
--

OK so that makes sense. Youa re using distance at 45 degrees. So the east-west 
would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.

UPDATE - NOTE:

Plugging all this into the web site, proves that Pythagorean is a good 
approximation... 

See Excel attached.

hsin = 309 km from pt to max
hsin = 314 km from pt to min
Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)

41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2)

The #2 will then subselect the points to limit within that result set.




  was (Author: billnbell):
OK so that makes sense. Youa re using distance at 45 degrees. So the 
east-west would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.

  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Open Relevance Requirements Questions

2010-09-21 Thread Itamar Syn-Hershko

On 21/9/2010 10:24 PM, Grant Ingersoll wrote:

I'm not sure it needs to be text/index-ready.  That can mean diff. things to 
diff. engines.  Our first requirement is that we have a corpora that has a 
known revision/signature so that all people are using the same raw content.  
What the engine chooses to do with it is up to the engine.  Can we provide 
tools that help it be text/index ready?  Of course, but that is not a 
requirement, in my view.
   
I hear what you say. I suggest we'll use use Orev as the starting point 
for discussions, assuming my point of view from last e-mail is 
acceptable, and build up from there.



You should supply a patch and build instructions, etc.
   


Where to? All sources are available directly from github through git or 
HTTP (tarball: http://github.com/synhershko/Orev/tarball/master). It 
isn't complete yet, so I think it better not be forked at this point.


As far as build instructions go, you'll need a C# compiler - either a 
full blown IDE (MS DevEnv, and there is a free express version too) or a 
command line compiler (which I think comes with the .NET framework 
itself). I haven't tested on Mono, but it might work too.


Itamar.


[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913244#action_12913244
 ] 

Bill Bell commented on SOLR-2125:
-

OK so that makes sense. Youa re using distance at 45 degrees. So the east-west 
would not extend far enough.

Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the 
east-west case, but circle or ellipses is MUCH better.

Although extending the 45 degree out would be a conservative estimate. And 
since we usually sort by distance asc, those extra points would be in the 
result set but at the end of the list. (this is an improvement - again not at 
good as ellipses).

You need a quick function that tells you "is this lat,long in the circle / 
ellipses or not". A range [X to Y] will not get you that. You need to use 
hsin().

On potential:

1. Do range select using points 
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates 
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
2. Check those points for distance  "in the ellipses".  
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) 
<= 0.1570;

That should make it fast and simplify the calculations.


> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

2010-09-21 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2010:
-

Attachment: SOLR-2010_shardSearchHandler_999521.patch
SOLR-2010_shardRecombineCollations_999521.patch

Both patch versions sync'ed to Trunk version 999521. (sorry about the many 
filename variants)

> Improvements to SpellCheckComponent Collate functionality
> -
>
> Key: SOLR-2010
> URL: https://issues.apache.org/jira/browse/SOLR-2010
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, spellchecker
>Affects Versions: 1.4.1
> Environment: Tested against trunk revision 966633
>Reporter: James Dyer
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, 
> SOLR-2010.patch, SOLR-2010.txt, 
> SOLR-2010_shardRecombineCollations_993538.patch, 
> SOLR-2010_shardRecombineCollations_999521.patch, 
> SOLR-2010_shardSearchHandler_993538.patch, 
> SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as 
> a patch to get suggestions for improvements and in case there is a broader 
> need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried 
> (applying original fq params also).  This is especially helpful when there is 
> more than one correction per query.  The 1.4 behavior does not verify that a 
> particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying 
> will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this 
> patch provides a viable workaround for the problem discussed in SOLR-1074.  A 
> dictionary could be created that combines the terms from the multiple fields. 
>  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try 
> before giving up.  Lower values ensure better performance.  Higher values may 
> be necessary to find a collation that can return results.  Default is 0, 
> which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 
> 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response 
> format detailing collations found.  default is false, which maintains 
> backwards-compatible behavior.  When true, output is like this (in context):
> 
>   
>   
>   94
>   7
>   11
>   
>   hope
>   how
>   hope
>   chops
>   hoped
>   etc
>   
>   
>   100
>   16
>   21
>   
>   fall
>   fails
>   fail
>   fill
>   faith
>   all
>   etc
>   
>   
>   
>   Title:(how AND fails)
>   2
>   
>   how
>   fails
>   
>   
>   
>   Title:(hope AND faith)
>   2
>   
>   hope
>   faith
>   
>   
>   
>   Title:(chops AND all)
>   1
>   
>   chops
>   all
>   
>   
>   
> 
> In addition, SOLRJ is updated to include 
> SpellCheckResponse.getCollatedResults(), which will return the expanded 
> Collation format.  getCollatedResult(), which returns a single String, is 
> retained for backwards-compatibility.  Other APIs were not changed but will 
> still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more 
> robust inter

Re: Open Relevance Requirements Questions

2010-09-21 Thread Grant Ingersoll

On Sep 21, 2010, at 2:54 PM, Itamar Syn-Hershko wrote:

> Addressing most of the recent discussion below...
> 
> On 16/9/2010 4:24 AM, Dan Cardin wrote:
>>1. Should the Open Relevance viewer be capable of importing text and
>>images?
>>   
> Corpora, IMO, should be text only and index-ready (e.g. no special parsing 
> required). This is what I assumed in Orev, as well (see below).

I'm not sure it needs to be text/index-ready.  That can mean diff. things to 
diff. engines.  Our first requirement is that we have a corpora that has a 
known revision/signature so that all people are using the same raw content.  
What the engine chooses to do with it is up to the engine.  Can we provide 
tools that help it be text/index ready?  Of course, but that is not a 
requirement, in my view.

> 
>> Is the objective of the Open Relevance Viewer to provide a crowd sourcing
>> tool that can have its data annotated and then to use the annotated data for
>> determining the performance of machine learning techniques/algorithms? Or,
>> is it to provide a generic crowd souring tool for academics, government, and
>> industry to annotate data with? Or am I missing the point?
>>   
> This tool should be, as Grant and Mark mentioned, engine agnostic. It should 
> provide those interested with tools to be able to judge effectiveness of 
> different engines, and also different methods with the same engine.
> 
> Hence, the most basic implementation should know to handle many corpora and 
> topics for more than one (natural) language, and the crowd-sourcing portion 
> of it is where a user can create judgments - e.g. view a document from a 
> corpus side by side with a topic, and mark "Relevant", "Non-relevant" (or 
> "Skip this").
> 
> This banal implementation after several hundreds of human-hours will result 
> in packages containing corpora, topics and judgments for several languages. 
> This can then be used as basis for more sophisticated parts of the project, 
> where relevance ranking of actual query results, TREC-like testing, MAP/MRR 
> and user behavior tracking are just examples. In other words, IMHO Grant's 
> view is a bit too far going for this stage, where there's still a lot of 
> fundamental work to do.
> 
> Robert, from the discussion we had a while ago I gathered you are thinking 
> the same?
> 
> Once such data exists in a central system, importing corpora and topics, and 
> exporting them back with judgments in various formats (TREC, CLEF, FIRE) can 
> be done fairly easily. We just need to make sure that system stores all data 
> correctly.
> 
> Sorry for bringing this up again, but I think I pretty much did most of that 
> work already, so no need for redundant efforts. In Orev I have already spec'd 
> and implemented all the above. What is missing is some better GUI and user 
> management. I suggest you have a look at it and at its DB scheme: 
> http://github.com/synhershko/Orev/blob/master/Orev.png

You should supply a patch and build instructions, etc.

> 
>> How are annotations used for judgments obtained? Separate file, specifed by 
>> the user?
>>   
> If a tool like Orev will be used, then this data can be pulled directly from 
> its DB by the actual test tools (if separate).
> 
>> Can you provide me with a direct link to the TREC format?
> 
> http://trec.nist.gov/pubs/trec1/papers/01.txt
> 
> But if we are not going to base data storage on the FS, there's no need to 
> stick to a particular format, only when exporting judgments...

Right

[jira] Updated: (LUCENE-2661) fold in test cases from Lucene in Action 2nd edition

2010-09-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2661:
---

Attachment: LUCENE-2661-books.patch

Forgot to include the books data (used to create the test index used by many 
tests).

> fold in test cases from Lucene in Action 2nd edition
> 
>
> Key: LUCENE-2661
> URL: https://issues.apache.org/jira/browse/LUCENE-2661
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2661-books.patch, LUCENE-2661.patch
>
>
> Manning Publications Co., publisher of Lucene in Action, 2nd edition
> (http://manning.com/lucene) wishes to donate all of the book's source
> code, to fold into Lucene's tests!
> It'll take some iterating to get the tests folded in... I'll attach
> the initial patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Document links

2010-09-21 Thread Mark Harwood

> It should be possible to randomly add and delete such relationships after
> indexWriter.addDocument(), is that the idea?


Yes. A "like" action may, for example allow me to tag an existing document by 
connecting 2 documents - my personal "like" document and a document with 
content of interest.
   doc 1   = [user:mark   tag:like]
   doc 56 = [title:Lucenebody:Lucene is a search library...]

I then call:
   indexWriter.addLink(1,56)

If this was my first "Like" then I may need to contemplate using a variation of 
the above API that allows a yet-to-be-committed "Document" object in place of 
the doc ids.


> Adding such relationships by docId would need the addition of
> a separate (from the segments) index structure


Yes, I need to think about the detail of file structures next. For now I'm 
sticking with thinking about user API and functionality and assuming we can 
maintain cross-segment docid references that get updated somehow at merge time.

> 
> 
> Would each link also have an attribute (think payload)?

I was thinking if attributes are needed (e.g. a star rating on my document 
"like" example) then this could be catered for with a document e.g. rather than 
linking the single doc [user:mark tag:like] to all my liked docs I could create 
specific doc instances of [user:mark rating:5 tag:like] and linking via that. 


> Would such relationships be named (sth like foreign key field names)?

For now I was thinking of storing simple docid->docid links.

Once we have these links we could do some funky things:
{pseudo code:}
   //My fave docs from last week
   int myLikesDocId=searchForLuceneDocWithUserNameAndTag("mark", "like");
   DocIdSet myLikedDocs =indexReader.getOutboundLinks(myLikesDocId)
   searcher.search(lastWeekRangeQuery, new Filter(myLikedDocs));

  //Other users who share my interests
  DocIdSet usersWhoLikeWhatILike = indexReader.getInboundLinks(myLikedDocs);


Cheers
Mark



[jira] Commented: (LUCENE-2661) fold in test cases from Lucene in Action 2nd edition

2010-09-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913185#action_12913185
 ] 

Paul Elschot commented on LUCENE-2661:
--

The surround test cases have already been done in LUCENE-1563

> fold in test cases from Lucene in Action 2nd edition
> 
>
> Key: LUCENE-2661
> URL: https://issues.apache.org/jira/browse/LUCENE-2661
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2661.patch
>
>
> Manning Publications Co., publisher of Lucene in Action, 2nd edition
> (http://manning.com/lucene) wishes to donate all of the book's source
> code, to fold into Lucene's tests!
> It'll take some iterating to get the tests folded in... I'll attach
> the initial patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913182#action_12913182
 ] 

Grant Ingersoll commented on SOLR-2125:
---

bq. The Local Solr from Patrick projected the ur, ll using the distance and did 
a range check to see if the points are in the box.It is a close approximation. 
An ellipses would be closer. Circle is not close enough.

OK, so I went back to that code and looked and we are doing the same thing (in 
fact, I based it off of that).  Namely, the only thing that is not working in 
the code is the interpretation of the results and not the math itself.  To that 
point, Bill, if you look at the points you gave, it is right on the edge of a 
box that is centered on that point (roughly, Chippewa Falls, WI), but still 
outside, of the box that was created by Solr (at 280KM).  I worked this out by 
viewing the maps at:
1. 
http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=44.936905&long1=-91.392935&lat2=43.09&long2=-93.87341
2. 
http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=44.936905&long1=-91.392935&lat2=45.17614&long2=-93.87341

The reason is that we are calculating the box based on the fact that the upper 
right and lower left corners are 280KM away.  The question is then, is this the 
intuitive thing app developers expect?  If app developers think in terms of a 
radius of distance 280km and all points inside, then no, it isn't.  But if they 
think in terms of the upper and lower box corners with no suggestion of 
"radius", hence implying a circle then it does.  

So, one fix for this could solely be a documentation fix.  The other fix is to 
change the reasoning above to simply find the north, south, east, west points 
of a box that transcribes a radius of the distance given (since most users tend 
to think in terms of radius from where they are located and not upper and lower 
box corners.  This will create a box that encloses said radius completely and 
might give a slight bit more fuzz up near the box corners due to curvature, but 
that should be fine.  I'm working on a patch for this fix, as I think it is the 
better way.



> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2661) fold in test cases from Lucene in Action 2nd edition

2010-09-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2661:
---

Attachment: LUCENE-2661.patch

Initial raw patch, just puts the sources under src/liatests, as a placeholder.  
We need to move these under the src/test/* tree...

> fold in test cases from Lucene in Action 2nd edition
> 
>
> Key: LUCENE-2661
> URL: https://issues.apache.org/jira/browse/LUCENE-2661
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Michael McCandless
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2661.patch
>
>
> Manning Publications Co., publisher of Lucene in Action, 2nd edition
> (http://manning.com/lucene) wishes to donate all of the book's source
> code, to fold into Lucene's tests!
> It'll take some iterating to get the tests folded in... I'll attach
> the initial patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Open Relevance Requirements Questions

2010-09-21 Thread Itamar Syn-Hershko

Addressing most of the recent discussion below...

On 16/9/2010 4:24 AM, Dan Cardin wrote:

1. Should the Open Relevance viewer be capable of importing text and
images?
   
Corpora, IMO, should be text only and index-ready (e.g. no special 
parsing required). This is what I assumed in Orev, as well (see below).



Is the objective of the Open Relevance Viewer to provide a crowd sourcing
tool that can have its data annotated and then to use the annotated data for
determining the performance of machine learning techniques/algorithms? Or,
is it to provide a generic crowd souring tool for academics, government, and
industry to annotate data with? Or am I missing the point?
   
This tool should be, as Grant and Mark mentioned, engine agnostic. It 
should provide those interested with tools to be able to judge 
effectiveness of different engines, and also different methods with the 
same engine.


Hence, the most basic implementation should know to handle many corpora 
and topics for more than one (natural) language, and the crowd-sourcing 
portion of it is where a user can create judgments - e.g. view a 
document from a corpus side by side with a topic, and mark "Relevant", 
"Non-relevant" (or "Skip this").


This banal implementation after several hundreds of human-hours will 
result in packages containing corpora, topics and judgments for several 
languages. This can then be used as basis for more sophisticated parts 
of the project, where relevance ranking of actual query results, 
TREC-like testing, MAP/MRR and user behavior tracking are just examples. 
In other words, IMHO Grant's view is a bit too far going for this stage, 
where there's still a lot of fundamental work to do.


Robert, from the discussion we had a while ago I gathered you are 
thinking the same?


Once such data exists in a central system, importing corpora and topics, 
and exporting them back with judgments in various formats (TREC, CLEF, 
FIRE) can be done fairly easily. We just need to make sure that system 
stores all data correctly.


Sorry for bringing this up again, but I think I pretty much did most of 
that work already, so no need for redundant efforts. In Orev I have 
already spec'd and implemented all the above. What is missing is some 
better GUI and user management. I suggest you have a look at it and at 
its DB scheme: http://github.com/synhershko/Orev/blob/master/Orev.png



How are annotations used for judgments obtained? Separate file, specifed by the 
user?
   
If a tool like Orev will be used, then this data can be pulled directly 
from its DB by the actual test tools (if separate).



Can you provide me with a direct link to the TREC format?


http://trec.nist.gov/pubs/trec1/papers/01.txt

But if we are not going to base data storage on the FS, there's no need 
to stick to a particular format, only when exporting judgments...


Itamar


[jira] Created: (LUCENE-2661) fold in test cases from Lucene in Action 2nd edition

2010-09-21 Thread Michael McCandless (JIRA)
fold in test cases from Lucene in Action 2nd edition


 Key: LUCENE-2661
 URL: https://issues.apache.org/jira/browse/LUCENE-2661
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0


Manning Publications Co., publisher of Lucene in Action, 2nd edition
(http://manning.com/lucene) wishes to donate all of the book's source
code, to fold into Lucene's tests!

It'll take some iterating to get the tests folded in... I'll attach
the initial patch.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913141#action_12913141
 ] 

Robert Muir commented on LUCENE-2658:
-

This test fails about 1% of the time currently... i applied the patch and ran 
it many times:

junit-sequential:
[junit] Testsuite: org.apache.lucene.index.TestIndexWriterExceptions
[junit] Tests run: 1800, Failures: 0, Errors: 0, Time elapsed: 1,291.131 sec

+1 to commit

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658.patch, LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913134#action_12913134
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 2:20 PM:
--

Two good articles:

http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
- Pole and good pictures

http://stackoverflow.com/questions/238260/how-to-calculate-the-bounding-box-for-a-given-lat-lng-location
- Why Sphere is only good for 10KM distances.

  was (Author: billnbell):
Tow good articles:

http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
- Pole and good pictures

http://stackoverflow.com/questions/238260/how-to-calculate-the-bounding-box-for-a-given-lat-lng-location
- Why Sphere is only good for 10KM distances.
  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913140#action_12913140
 ] 

Tim Smith commented on LUCENE-2658:
---

sadly i haven't been able to gather the infostream for LUCENE-2501
there's a comment on LUCENE-2501 that seems to indicate the exception that 
started it all though (CorruptIndexException: docs out of order (607 <= 607 ))

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658.patch, LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

2010-09-21 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913133#action_12913133
 ] 

Woody Anderson commented on LUCENE-2544:


I really do understand the difference between Field the instance object, and a 
field in the index. I use cap F for the java class and lowercase for the index.

You can accomplish this with two NFs, but you can also implement NumericField 
with a series of "new Field()" using the same name as well. But, you don't do 
this, b/c it's much more convenient to have it bundled up in a nice concise 
form.

There is (admittedly, from my perspective) one issue with this kind of feature. 
It's that the divisor must be known and kept track of by the lucene user during 
query parsing and during term-enum inspection if they are doing that sort of 
thing. The current QueryParser uses a map of field to DateTools.Resolution, 
which this mechanism would effectively mimic. Though it would produce 
NumericField formatted tokens in the index rather than date strings, which can 
often be an advantage for ranges etc. The fact that it also provides numeric 
resolution for any numeric field is a bonus, but it would involve some change 
to the QueryParser to correctly handle this, as it currently does not handle 
querying any field indexed as NumericField. Both this edit and DateTools have 
the same drawbacks for term-enum inspection (facets etc), so clearly the 
responsibility for handling that lies with the user of lucene already. I have a 
schema at parse/inspect time, so i had overlooked this issue.

At any rate, I still don't get what you consider confusing about this 
functionality. DateTools.Res shows clear use case, modern NumericField features 
for fast ranges etc. is often a clear improvement over string date tokens at 
any resolution. And wrapping it up into the single existing class is just 
easier to use than requiring multiple NF objects be added to the document. 
Unless you advocated that NF be implemented as a static utility class that 
injected multiple Field objects into the Document, i'm not sure why this 
consolidation goes against the grain.

> Add 'divisor' to NumericField, allows for easy storage of full precision 
> data, but indexing *starting* at lower precision.
> --
>
> Key: LUCENE-2544
> URL: https://issues.apache.org/jira/browse/LUCENE-2544
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.0.2
>Reporter: Woody Anderson
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision 
> numeric at a much lower precision, but we still want to store the full 
> precision data.
> Rather than have to do this with two Field objects in the Document, it'd be 
> easier to provide NumericField with a divisor as well as prevision step. The 
> divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the 
> constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913134#action_12913134
 ] 

Bill Bell commented on SOLR-2125:
-

Tow good articles:

http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
- Pole and good pictures

http://stackoverflow.com/questions/238260/how-to-calculate-the-bounding-box-for-a-given-lat-lng-location
- Why Sphere is only good for 10KM distances.

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913126#action_12913126
 ] 

Bill Bell commented on SOLR-2125:
-

Grant: Can you give a good example using the same measurements for sorting and 
sfilt ?

Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch 
shows:

3.Using the "frange" QParser, as in fq={!frange l=0 u=400}hsin(0.57, -1.3, 
lat_rad, lon_rad, 3963.205) 

The Function Wiki Shows:

hsin(radius, true|false, x1,y1,x2,y2)

Would the correct solution be the following - can I use a vector?

http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=hsin(6371,true,store,vector(44.9369054,-91.3929348))
 asc 

The Local Solr from Patrick projected the ur, ll using the distance and did a 
range check to see if the points are in the box.It is a close approximation. An 
ellipses would be closer. Circle is not close enough.


> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Document links

2010-09-21 Thread Paul Elschot
Op dinsdag 21 september 2010 18:30:08 schreef mark harwood:
> >>Wouldn't that be sufficient?
> 
> Not for some apps. I tried playing the "Kevin Bacon" game using a Lucene 
> index 
> of IMDB data using actorID and movieID keys.
> The difference between that and Neo4j on the same data and query  is night 
> and 
> day. The graph databases are really onto something when resolving a 
> relationship 
> doesn't first require an index to look up endpoints.

When the key values are given by the user this would boil down to adding
the primary and foreign key to Lucene, but that does not appear to be the idea.

It should be possible to randomly add and delete such relationships after
indexWriter.addDocument(), is that the idea?

Adding such relationships by docId would need the addition of
a separate (from the segments) index structure (probably some B-tree) that
would have segmentId-segmentDocId as (part of) the keys, and also as (part of) 
the values.

Would each link also have an attribute (think payload)?
Would such relationships be named (sth like foreign key field names)?


Regards,
Paul Elschot


> 
> 
> - Original Message 
> From: Paul Elschot 
> To: dev@lucene.apache.org
> Sent: Tue, 21 September, 2010 17:25:31
> Subject: Re: Document links
> 
> When the (primary) key values are provided by the user,
> one could use additional small documents to only store/index
> these relations whenever they change.
> 
> Wouldn't that be sufficient?
> 
> Regards,
> Paul Elschot
> 
> 
> 
> Op dinsdag 21 september 2010 00:35:02 schreef mark harwood:
> > I've been looking at Graph Databases recently (neo4j, OrientDb, 
> > InfiniteGraph) 
> 
> > as a faster alternative to relational stores. I notice they either embed 
> > Lucene 
> >
> > for indexing node properties or (in the case of OrientDB) are talking about 
> > doing this. 
> > 
> > I think their fundamental performance advantage over relational stores is 
> > that 
> 
> > they don't have to de-reference foreign keys in a b-tree index to get from 
> > a 
> > source node to a target node. Instead they use internally-generated IDs to 
> > act 
> 
> > like pointers with more-or-less direct references between nodes/vertexes.  
> > As a 
> >
> > result they can follow links very quickly. This got me thinking could 
> > Lucene 
> > adopt the idea of creating links between documents that were equally fast 
> > using 
> >
> > Lucene doc ids?
> > 
> > Maybe the user API would look something like this...
> > 
> > indexWriter.addLink(fromDocId, toDocId);
> > DocIdSet reader.getInboundLinks(docId);
> > DocIdSet reader.getOutboundLinks(docId);
> > 
> > 
> > Internally a new index file structure would be needed to record link info. 
> > Any 
> 
> > recorded links that connect documents from different segments would need 
> >careful 
> >
> > adjustment of referenced link IDs when segments merge and Lucene doc IDs 
> > are 
> > shuffled.
> > 
> > As well as handling typical graphs (social networks, web data) this could 
> > potentially be used to support tagging operations where apps could create 
> > "tag" 
> >
> > documents and then link them to existing documents that are being tagged 
> >without 
> >
> > having to update the target doc. There are probably a ton of applications 
> > for 
> > this stuff.
> > 
> > I see the Graph DBs busy recreating transactional support, indexes, segment 
> > merging etc and it seems to me that Lucene has a pretty good head start 
> > with 
> > this stuff.
> > Anyone else think this might be an area worth exploring?
> > 
> > Cheers
> > Mark
> > 
> > 
> >  
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> > 
> > 
> > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913111#action_12913111
 ] 

Michael McCandless commented on LUCENE-2658:


Unfortunately, no, not directly anyway... -- this bug is specific to term 
vectors.

Under the hood, term vectors and postings (doc/frq/pos) use the same class 
(ByteBlockPool), and in this case, after an exception, the term vectors code 
was failing to reset the ByteBlockPool.

Though it is possible the same idea is happening on LUCENE-2501, ie, an 
exception at a bad time leaving the ByteBlockPool in the same state... can you 
post the infoStream output on LUCENE-2501?

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658.patch, LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913098#action_12913098
 ] 

Robert Muir commented on LUCENE-2658:
-

Tim, i didnt see your problem coming from termvectorswriter... i think it might 
be a different bug.

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658.patch, LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913097#action_12913097
 ] 

Tim Smith commented on LUCENE-2658:
---

Is this related to/same as LUCENE-2501?

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658.patch, LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2127) Improved large result handling

2010-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913096#action_12913096
 ] 

Uwe Schindler commented on LUCENE-2127:
---

Grant: In your latest patch, the collector seems to ignore docBase (or the next 
IndexReader), so the whole thing only works with optimized indexes. It should 
simply store the base doc and add it on new ScoreDoc.

> Improved large result handling
> --
>
> Key: LUCENE-2127
> URL: https://issues.apache.org/jira/browse/LUCENE-2127
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2127.patch, LUCENE-2127.patch
>
>
> Per 
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
>  it would be nice to offer some other Collectors that are better at handling 
> really large number of results.  This could be implemented in a variety of 
> ways via Collectors.  For instance, we could have a raw collector that does 
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests 
> and have Collectors that have heuristics about memory tradeoffs and only 
> heapify when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

2010-09-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913084#action_12913084
 ] 

Uwe Schindler commented on LUCENE-2544:
---

It is two field instances, but results only in one field in the index. Stored 
fields and indexed fields are handled separate by the indexer, so there is 
nothing different between a combined store/index and two separate Field 
instances (same field name!) with one is stored the other is indexed. If you 
want to store something different than you indexed, this is the way to go:
{code}
doc.add(new NumericField(name, Field.Store.NO, true).setIntValue(xxx/divisor));
doc.add(new NumericField(name, Field.Store.YES, false).setIntValue(xxx));
{code}

> Add 'divisor' to NumericField, allows for easy storage of full precision 
> data, but indexing *starting* at lower precision.
> --
>
> Key: LUCENE-2544
> URL: https://issues.apache.org/jira/browse/LUCENE-2544
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Affects Versions: 3.0.2
>Reporter: Woody Anderson
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision 
> numeric at a much lower precision, but we still want to store the full 
> precision data.
> Rather than have to do this with two Field objects in the Document, it'd be 
> easier to provide NumericField with a divisor as well as prevision step. The 
> divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the 
> constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Document links

2010-09-21 Thread mark harwood
>>Wouldn't that be sufficient?

Not for some apps. I tried playing the "Kevin Bacon" game using a Lucene index 
of IMDB data using actorID and movieID keys.
The difference between that and Neo4j on the same data and query  is night and 
day. The graph databases are really onto something when resolving a 
relationship 
doesn't first require an index to look up endpoints.





- Original Message 
From: Paul Elschot 
To: dev@lucene.apache.org
Sent: Tue, 21 September, 2010 17:25:31
Subject: Re: Document links

When the (primary) key values are provided by the user,
one could use additional small documents to only store/index
these relations whenever they change.

Wouldn't that be sufficient?

Regards,
Paul Elschot



Op dinsdag 21 september 2010 00:35:02 schreef mark harwood:
> I've been looking at Graph Databases recently (neo4j, OrientDb, 
> InfiniteGraph) 

> as a faster alternative to relational stores. I notice they either embed 
> Lucene 
>
> for indexing node properties or (in the case of OrientDB) are talking about 
> doing this. 
> 
> I think their fundamental performance advantage over relational stores is 
> that 

> they don't have to de-reference foreign keys in a b-tree index to get from a 
> source node to a target node. Instead they use internally-generated IDs to 
> act 

> like pointers with more-or-less direct references between nodes/vertexes.  As 
> a 
>
> result they can follow links very quickly. This got me thinking could Lucene 
> adopt the idea of creating links between documents that were equally fast 
> using 
>
> Lucene doc ids?
> 
> Maybe the user API would look something like this...
> 
> indexWriter.addLink(fromDocId, toDocId);
> DocIdSet reader.getInboundLinks(docId);
> DocIdSet reader.getOutboundLinks(docId);
> 
> 
> Internally a new index file structure would be needed to record link info. 
> Any 

> recorded links that connect documents from different segments would need 
>careful 
>
> adjustment of referenced link IDs when segments merge and Lucene doc IDs are 
> shuffled.
> 
> As well as handling typical graphs (social networks, web data) this could 
> potentially be used to support tagging operations where apps could create 
> "tag" 
>
> documents and then link them to existing documents that are being tagged 
>without 
>
> having to update the target doc. There are probably a ton of applications for 
> this stuff.
> 
> I see the Graph DBs busy recreating transactional support, indexes, segment 
> merging etc and it seems to me that Lucene has a pretty good head start with 
> this stuff.
> Anyone else think this might be an area worth exploring?
> 
> Cheers
> Mark
> 
> 
>  
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Document links

2010-09-21 Thread Paul Elschot
When the (primary) key values are provided by the user,
one could use additional small documents to only store/index
these relations whenever they change.

Wouldn't that be sufficient?

Regards,
Paul Elschot



Op dinsdag 21 september 2010 00:35:02 schreef mark harwood:
> I've been looking at Graph Databases recently (neo4j, OrientDb, 
> InfiniteGraph) 
> as a faster alternative to relational stores. I notice they either embed 
> Lucene 
> for indexing node properties or (in the case of OrientDB) are talking about 
> doing this. 
> 
> I think their fundamental performance advantage over relational stores is 
> that 
> they don't have to de-reference foreign keys in a b-tree index to get from a 
> source node to a target node. Instead they use internally-generated IDs to 
> act 
> like pointers with more-or-less direct references between nodes/vertexes.  As 
> a 
> result they can follow links very quickly. This got me thinking could Lucene 
> adopt the idea of creating links between documents that were equally fast 
> using 
> Lucene doc ids?
> 
> Maybe the user API would look something like this...
> 
> indexWriter.addLink(fromDocId, toDocId);
> DocIdSet reader.getInboundLinks(docId);
> DocIdSet reader.getOutboundLinks(docId);
> 
> 
> Internally a new index file structure would be needed to record link info. 
> Any 
> recorded links that connect documents from different segments would need 
> careful 
> adjustment of referenced link IDs when segments merge and Lucene doc IDs are 
> shuffled.
> 
> As well as handling typical graphs (social networks, web data) this could 
> potentially be used to support tagging operations where apps could create 
> "tag" 
> documents and then link them to existing documents that are being tagged 
> without 
> having to update the target doc. There are probably a ton of applications for 
> this stuff.
> 
> I see the Graph DBs busy recreating transactional support, indexes, segment 
> merging etc and it seems to me that Lucene has a pretty good head start with 
> this stuff.
> Anyone else think this might be an area worth exploring?
> 
> Cheers
> Mark
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2658:
---

Attachment: LUCENE-2658.patch

This was a real issue!

It happens if you hit an exception while processing term vectors, and, your 
docs have multiple fields with term vectors enabled.

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658.patch, LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2658) TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice

2010-09-21 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2658:
--

Assignee: Michael McCandless

> TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice
> 
>
> Key: LUCENE-2658
> URL: https://issues.apache.org/jira/browse/LUCENE-2658
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 3.1, 4.0
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Attachments: LUCENE-2658_environment.patch
>
>
> TestIndexWriterExceptions threw this today, and its reproducable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Whither ORP?

2010-09-21 Thread Mark Bennett
Folks,

This is all great stuff.  I too have some thoughts on things to contribute,
but always a matter of time like everybody else.

I have a preference overall for keeping ORP separate, I think that's likely
to attract other engines.

But the one advantage to Lucene/Solr is the increased visibility.

I will commit to mentioning ORP when appropriate in talks that I give, web
sites that I work on, and in blog postings every now and then. We do have
some PR related things coming up.

And welcome Tommaso!

My particular interest is cross engine functionality.  You'll find a wide
difference of opinion about which things are a priority and why, but it
comes down to "scratch your own itch". I care about cross engine adapters,
so that's what I've been thinking about / prototyping.

4 months ago I was working on relevancy judgments, so I posted about that,
and tried to contribute some ideas. For legal reasons I wasn't able to
contribute code at that time.

Does anybody know if ORP is being represented at the upcoming Lucene
conference?  We'd volunteer to staff a booth, but our company wasn't
planning to buy a booth as the prices were way too high for such a new show.

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Tue, Sep 14, 2010 at 12:47 PM, Itamar Syn-Hershko wrote:

> If you mean you need help spec'ing this, my hands are on deck. Let me know
> what you're looking for aside from what I already described in my mail from
> July.
>
> Itamar.
>
>
> On 14/9/2010 9:34 PM, Dan Cardin wrote:
>
>> I started the requirements...
>>
>> https://cwiki.apache.org/confluence/display/ORP/Open+Relevance+Viewer
>>
>> Is there anyone I can connect with to flush out the requirements?
>>
>> --Dan
>>
>> On Tue, Sep 14, 2010 at 1:34 PM, Dan Cardin
>>  wrote:
>>
>>
>>
>>> Hello,
>>>
>>> I will begin documenting some basic requirements for a crowd sourcing web
>>> app. I will use some of the work done by Itamar as a basis for the
>>> requirements.
>>>
>>> --Dan
>>>
>>> On Tue, Sep 14, 2010 at 1:18 PM, Itamar Syn-Hershko>> >wrote:
>>>
>>>
>>>
 On 14/9/2010 3:44 PM, Grant Ingersoll wrote:



> If you can, putting them up as a patch would be useful.  That way, we
> can
> show some progress.
>
>
>
 I will, but first it needs to be workable. It is 80% done, but still not
 that usable. I expect to be able to work on it again in a month or so.
 Or
 someone else could resume from where I stopped (in .NET, or port it to
 Java). I'm can share what is missing if anyone is interested.

 Itamar.



>>>
>>>
>>>
>>
>>
>


[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913035#action_12913035
 ] 

Yonik Seeley commented on SOLR-2125:


More chatting with Grant - sqrt(2) is correct if things are flat, but prob not 
correct for a bounding box on the surface of a sphere.
Another thought is toproject left, right, up, and down to the sides of the box 
and use those lat ranges and lon ranges directly as the bounding box.

The crazy thing is that this is basic geo code - isn't there some bounding box 
calculation code out there we can use or at least reference?

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913011#action_12913011
 ] 

Yonik Seeley edited comment on SOLR-2125 at 9/21/10 10:45 AM:
--

Ok Grant & I chatted and we figured out what's going wrong.  We were 
calculating a box the size that would completely fit inside the circle rather 
than vice-versa.  This was caused by taking the distance and projecting it out 
to calculate the corners of the box.  But the distance given should really be 
to the side of the box... and the distance from the center to the corner of the 
box should be greater (if the box is to completely encompass the circle).

The fix should be easy - the distance to the corner of the box is sqrt(2) * 
dist_to_side_of_box.  So internally we just need to multiply the distance by 
sqrt(2) before finding the corners.

Grant is coding up the fix and tests.

  was (Author: ysee...@gmail.com):
Ok Grant & I chatted and we figured out what's going wrong.  We were 
calculating a box the size that would completely fit inside the circle rather 
than vice-versa.  This was caused by taking the distance and projecting it out 
to calculate the corners of the box.  But the distance given should really be 
to the side of the box... and the distance from the center to the corner of the 
box should be greater (if the box is to completely encompass the circle).

The fix should be easy - the distance to the corner of the box is sqrt(2) * 
dist_to_size_of_box.  So internally we just need to multiply the distance by 
sqrt(2) before finding the corners.

Grant is coding up the fix and tests.
  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913011#action_12913011
 ] 

Yonik Seeley commented on SOLR-2125:


Ok Grant & I chatted and we figured out what's going wrong.  We were 
calculating a box the size that would completely fit inside the circle rather 
than vice-versa.  This was caused by taking the distance and projecting it out 
to calculate the corners of the box.  But the distance given should really be 
to the side of the box... and the distance from the center to the corner of the 
box should be greater (if the box is to completely encompass the circle).

The fix should be easy - the distance to the corner of the box is sqrt(2) * 
dist_to_size_of_box.  So internally we just need to multiply the distance by 
sqrt(2) before finding the corners.

Grant is coding up the fix and tests.

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912993#action_12912993
 ] 

Grant Ingersoll commented on SOLR-2125:
---

{code}
 if (options.measStr == null || options.measStr.equals("hsin")) {
  ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], 
options.distance, null, true, options.radius);
  ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], 
options.distance, null, false, options.radius);
}
{code}

It could very well be in the determination of the two corners.  That being 
said, in the test I just posted, the point 18.7, 19.79750 is 3000KM away 
from 0,0 according to movable-type when traveling along a 45 degree bearing and 
the tests 
{code}
checkHits(fieldName, "0,0", 3000, 2, 14, 15);
checkHits(fieldName, "0,0", 3001, 3, 14, 15, 16);
checkHits(fieldName, "0,0", 3000.1, 3, 14, 15, 16);
{code}

Pass for me now that Yonik's patch is applied.  Please check my logic.

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912990#action_12912990
 ] 

Grant Ingersoll commented on SOLR-2125:
---

Ah, good point, Yonik.  I will fix that.

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2108) ReversedWildcardFilter can create false positives

2010-09-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2108.
---

Resolution: Fixed

Committed revision 999424.

> ReversedWildcardFilter can create false positives
> -
>
> Key: SOLR-2108
> URL: https://issues.apache.org/jira/browse/SOLR-2108
> Project: Solr
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2108.patch
>
>
> Reported from the userlist: 
> {noformat}
> "For instance, the query *zemog* matches documents that contain Gomez"
> {noformat}
> http://www.lucidimagination.com/search/document/35abfdabfcec99b7/false_matches_with_reversedwildcardfilterfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912986#action_12912986
 ] 

Yonik Seeley commented on SOLR-2125:


bq. but then you are sorting using Euclidean distance

I imagine it's because of the wiki: http://wiki.apache.org/solr/SpatialSearch
The first example uses "dist" and I imagine many are going to interpret that as 
geo distance (I made the same mistake trying to follow the wiki).

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912983#action_12912983
 ] 

Grant Ingersoll commented on SOLR-2125:
---

bq. 
http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
 asc 

This URL has me a bit stumped.  sfilt on a lat_lon is going to use Great Circle 
to calculate the distance, but then you are sorting using Euclidean distance?  
Not saying that's a problem, but it strikes me a bit weird.

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-2125:
-

Assignee: Grant Ingersoll

> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>Assignee: Grant Ingersoll
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912981#action_12912981
 ] 

Grant Ingersoll commented on SOLR-2125:
---

Hmm, I'm putting in some more exact tests into LatLonType and 
SpatialFilterTest, trying to work through this.
{code}
clearIndex();
assertU(adoc("id", "14", fieldName, "0,5"));
assertU(adoc("id", "15", fieldName, "0,15"));
//one at the upper right of the box, 3000KM from 0,0, see 
http://www.movable-type.co.uk/scripts/latlong.html
assertU(adoc("id", "16", fieldName, "18.7,19.79750"));
assertU(commit());

checkHits(fieldName, "0,0", 1000, 1, 14);
checkHits(fieldName, "0,0", 2000, 1, 14);
checkHits(fieldName, "0,0", 3000, 2, 14, 15);
checkHits(fieldName, "0,0", 3001, 3, 14, 15, 16);
checkHits(fieldName, "0,0", 3000.1, 3, 14, 15, 16);
{code}


> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2650) improve windows defaults in FSDirectory

2010-09-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2650.
-

Resolution: Fixed

Committed revision 999409 to trunk.

> improve windows defaults in FSDirectory
> ---
>
> Key: LUCENE-2650
> URL: https://issues.apache.org/jira/browse/LUCENE-2650
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2650.patch, LUCENE-2650.patch
>
>
> Currently windows defaults to SimpleFSDirectory, but this is a problem due to 
> the synchronization.
> I have been benchmarking queries *sequentially* and was pretty surprised at 
> how much faster
> MMapDirectory is, for example for cases that do many seeks.
> I think we should change the defaults for windows as such:
> if (WINDOWS and UNMAP_SUPPORTED and 64-bit)
>   use MMapDirectory
> else
>   use SimpleFSDirectory 
> I think we should just consider doing this for 4.0 only and see how it goes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912968#action_12912968
 ] 

Yonik Seeley commented on SOLR-1568:


Didn't fix things (it was just a pole check) - the bounding box calc is off... 
see SOLR-2125

> Implement Spatial Filter
> 
>
> Key: SOLR-1568
> URL: https://issues.apache.org/jira/browse/SOLR-1568
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: CartesianTierQParserPlugin.java, 
> SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
> SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
> SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
> SOLR-1568.patch, SOLR-1568.patch
>
>
> Given an index with spatial information (either as a geohash, 
> SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
> able to pass in a filter query that takes in the field name, lat, lon and 
> distance and produces an appropriate Filter (i.e. one that is aware of the 
> underlying field type for use by Solr. 
> The interface _could_ look like:
> {code}
> &fq={!sfilt dist=20}location:49.32,-79.0
> {code}
> or it could be:
> {code}
> &fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
> {code}
> or:
> {code}
> &fq={!sfilt p=49.32,-79.0 f=location dist=20}
> {code}
> or:
> {code}
> &fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-21 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912955#action_12912955
 ] 

Grant Ingersoll commented on SOLR-1568:
---

bq. Default is the Earth's mean radius in kilometers (see 
org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which 
is set to 3,958.761458084784856

That's a typo.  I'll fix.

I think Yonik just fixed this on rev 999175.  Can you try trunk?



> Implement Spatial Filter
> 
>
> Key: SOLR-1568
> URL: https://issues.apache.org/jira/browse/SOLR-1568
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: CartesianTierQParserPlugin.java, 
> SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
> SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
> SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
> SOLR-1568.patch, SOLR-1568.patch
>
>
> Given an index with spatial information (either as a geohash, 
> SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
> able to pass in a filter query that takes in the field name, lat, lon and 
> distance and produces an appropriate Filter (i.e. one that is aware of the 
> underlying field type for use by Solr. 
> The interface _could_ look like:
> {code}
> &fq={!sfilt dist=20}location:49.32,-79.0
> {code}
> or it could be:
> {code}
> &fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
> {code}
> or:
> {code}
> &fq={!sfilt p=49.32,-79.0 f=location dist=20}
> {code}
> or:
> {code}
> &fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2482) Index sorter

2010-09-21 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated LUCENE-2482:
--

Affects Version/s: 4.0

> Index sorter
> 
>
> Key: LUCENE-2482
> URL: https://issues.apache.org/jira/browse/LUCENE-2482
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Affects Versions: 3.1, 4.0
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: 3.1
>
> Attachments: indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2482) Index sorter

2010-09-21 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  reopened LUCENE-2482:
---


..to remember we need a port of this tool to 4.0

> Index sorter
> 
>
> Key: LUCENE-2482
> URL: https://issues.apache.org/jira/browse/LUCENE-2482
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Affects Versions: 3.1, 4.0
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: 3.1
>
> Attachments: indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: (LUCENE-2482) Index sorter

2010-09-21 Thread Michael McCandless
On Tue, Sep 21, 2010 at 5:31 AM, Andrzej Bialecki  wrote:
> On 2010-09-21 06:20, Lance Norskog wrote:
>> What is the philosophy about the 3.x branch? This is an all-new feature
>> added to 3.x.
>
> FWIW, I plan to port this tool to trunk.

Ahh excellent.  Though you should leave the issue opened (and add 4.0
fix version) so "we" don't forget...?

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Resolved: (LUCENE-2482) Index sorter

2010-09-21 Thread Andrzej Bialecki

On 2010-09-21 06:20, Lance Norskog wrote:

What is the philosophy about the 3.x branch? This is an all-new feature
added to 3.x.


FWIW, I plan to port this tool to trunk.

--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: (LUCENE-2482) Index sorter

2010-09-21 Thread Michael McCandless
On Tue, Sep 21, 2010 at 12:20 AM, Lance Norskog  wrote:
> What is the philosophy about the 3.x branch? This is an all-new feature
> added to 3.x.

That's perfectly fine.

3.x is Lucene's "stable" branch -- it'll get new features as long as
they are "deemed" to not be too dangerous nor break backwards
compatibility.

Major changes, eg flexible indexing, will be in 4.0 (trunk) but won't
be back ported to 3.x.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912883#action_12912883
 ] 

Bill Bell edited comment on SOLR-2125 at 9/21/10 5:01 AM:
--

/lucene/contrib/spatial/src/java/org/apache/lucene/spatial/DistanceUtils.java - 
hsin appears to be okay.

Good print:

System.out.println("x1 = " + x1 + " - " + x1*180/Math.PI + ",y1 = " + y1 + " - 
" + x2*180/Math.PI + ",x2 = " + x2 + " - " + x2*180/Math.PI + ",y2 = " + y2 + " 
- " + y2*180/Math.PI + ",radius = " + radius + "\n");
 System.out.println("result = " + result + " km\n");

My guess: "src/java/org/apache/solr/schema/LatLonType.java


{code}
double[] ur;
double[] ll;
if (options.measStr == null || options.measStr.equals("hsin")) {
  ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], 
options.distance, null, true, options.radius);
  ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], 
options.distance, null, false, options.radius);
} else {
  ur = DistanceUtils.vectorBoxCorner(point, null, options.distance, true);
  ll = DistanceUtils.vectorBoxCorner(point, null, options.distance, false);
}

{code}




  was (Author: billnbell):

/lucene/contrib/spatial/src/java/org/apache/lucene/spatial/DistanceUtils.java - 
hsin appears to be okay.

Good print:

System.out.println("x1 = " + x1 + " - " + x1*180/Math.PI + ",y1 = " + y1 + " - 
" + x2*180/Math.PI + ",x2 = " + x2 + " - " + x2*180/Math.PI + ",y2 = " + y2 + " 
- " + y2*180/Math.PI + ",radius = " + radius + "\n");
 System.out.println("result = " + result + " km\n");

Where is the boundary calculated?


  
> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Hudson build is back to normal : Solr-trunk #1254

2010-09-21 Thread Apache Hudson Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2125) Spatial filter is not accurate

2010-09-21 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912883#action_12912883
 ] 

Bill Bell commented on SOLR-2125:
-

/lucene/contrib/spatial/src/java/org/apache/lucene/spatial/DistanceUtils.java - 
hsin appears to be okay.

Good print:

System.out.println("x1 = " + x1 + " - " + x1*180/Math.PI + ",y1 = " + y1 + " - 
" + x2*180/Math.PI + ",x2 = " + x2 + " - " + x2*180/Math.PI + ",y2 = " + y2 + " 
- " + y2*180/Math.PI + ",radius = " + radius + "\n");
 System.out.println("result = " + result + " km\n");

Where is the boundary calculated?



> Spatial filter is not accurate
> --
>
> Key: SOLR-2125
> URL: https://issues.apache.org/jira/browse/SOLR-2125
> Project: Solr
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5
>Reporter: Bill Bell
>
> The calculations of distance appears to be off.
> Note: "The radius of the sphere to be used when calculating distances on a 
> sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
> (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
> which is set to 3,958.761458084784856. Most applications will not need to set 
> this."
> The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
> Also filtering distance appears to be off - example data:
> 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
> miles = 220 kilometers
> http://../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348))
>  asc 
> Nothing shows. d=285 shows results. This is off by a lot.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >