[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-06-08 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

For some reason a diff with the latest branch introduced a lot of duplicate 
changes so this is the latest patch off trunk.

This patch resolves all no commits, including:

* random polygon testing
* thread safety testing
* added tolerance to expectation check in random test
* beast tested w/ 500 iterations

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-06-08 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Issue Type: New Feature  (was: Improvement)

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-28 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

Updates:

* cache ranges across segments
* only add ranges that are either within or cross the boundary of the bbox or 
polygon

In exotic cases this latter fix drastically reduces the number of ranges added 
since it avoids unnecessary exterior cells that only touch the boundary. The 
downside is since the random test doesn't currently use the TOLERANCE criteria 
it occasionally fails due computation error at 1e-7 precision. This can be 
tweaked in the next patch.

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-26 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

You can use the postings when the cell is wholly contained within the polygon, 
which wasn't in that last patch.  New patch attached to include this logic.  

Boundary cells are still computed as they relate to the bbox. This could be 
improved by computing boundary cells as they relate to the shape as long as 
computing the existence of an intersection of the bbox and shape is fast - this 
is usually the Achilles heel of spatial relations.

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-26 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

Thanks Mike! Not sure how I missed my own test. :)  Trivial fix though, new 
patch added and all tests are passing on my end. Next iteration ready for 
review. This should be ready for sandbox commit blessings.

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-22 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

Updated patch to fix false negatives. This now improves performance of 
[LUCENE-6450|https://issues.apache.org/jira/browse/LUCENE-6450] to 0.02 sec / 
query by using the postings list instead of visiting every term.

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481.patch, LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-14 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

Minor patch update that adds geodesic to geodetic projection / reprojection 
methods to GeoUtils.

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-13 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481_WIP.patch

First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so 
this is definitely not ready. So far I've been unable to reproduce the false 
negatives...I put it here for iterating improvements.

*GeoPointField*

Index Time:  640.24 sec
Index Size: 4.4G
Mean Query Time:  0.02 sec

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-13 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

The test had the lat and lon ordering incorrect for both GeoPointFieldType and 
the GeoPointInBBoxQuery. I've attached a new patch with the correction.  

testRandomTiny passes but there is one failure in testRandom with the following:
{noformat}
ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom 
-Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true
{noformat}

{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery 
-Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true 
-Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom 
   [junit4] Throwable #1: java.lang.AssertionError: id=632 docID=613 
lat=46.19240875459866 lon=143.92476891121902 expected true but got: false 
deleted?=false
   [junit4]at 
__randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0)
   [junit4]at 
org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302)
   [junit4]at 
org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204)
   [junit4]at 
org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130)
   [junit4]at java.lang.Thread.run(Thread.java:745)
{noformat}

This should be enough to debug the issue. I expect to have a new patch sometime 
tomorrow or before weeks end.

 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481.patch, 
 LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms

2015-05-13 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-6481:
---
Attachment: LUCENE-6481.patch

New patch, starting from [~nknize]'s and then folding in the evilish random 
test I added for LUCENE-6477 ... maybe this can help debug why there are false 
negatives?

E.g. with this patch when I run:

{noformat}
ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny 
-Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true
{noformat}

It fails with this:
{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery 
-Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 
-Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny 
   [junit4] Throwable #1: java.lang.AssertionError: id=0 docID=0 
lat=-27.18027939545 lon=-167.14191331870592 expected true but got: false 
deleted?=false
   [junit4]at 
__randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0)
   [junit4]at 
org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301)
   [junit4]at 
org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203)
   [junit4]at 
org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125)
   [junit4]at java.lang.Thread.run(Thread.java:745)
{noformat}

The test case should be easy-ish to debug: it only indexes at most a few 10s of 
points...


 Improve GeoPointField type to only visit high precision boundary terms 
 ---

 Key: LUCENE-6481
 URL: https://issues.apache.org/jira/browse/LUCENE-6481
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Nicholas Knize
 Attachments: LUCENE-6481.patch, LUCENE-6481_WIP.patch


 Current GeoPointField [LUCENE-6450 | 
 https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges 
 along the space-filling curve that represent a provided bounding box.  This 
 determines which terms to visit in the terms dictionary and which to skip. 
 This is suboptimal for large bounding boxes as we may end up visiting all 
 terms (which could be quite large). 
 This incremental improvement is to improve GeoPointField to only visit high 
 precision terms in boundary ranges and use the postings list for ranges that 
 are completely within the target bounding box.
 A separate improvement is to switch over to auto-prefix and build an 
 Automaton representing the bounding box.  That can be tracked in a separate 
 issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org