[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch For some reason a diff with the latest branch introduced a lot of duplicate changes so this is the latest patch off trunk. This patch resolves all no commits, including: * random polygon testing * thread safety testing * added tolerance to expectation check in random test * beast tested w/ 500 iterations Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Issue Type: New Feature (was: Improvement) Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch Updates: * cache ranges across segments * only add ranges that are either within or cross the boundary of the bbox or polygon In exotic cases this latter fix drastically reduces the number of ranges added since it avoids unnecessary exterior cells that only touch the boundary. The downside is since the random test doesn't currently use the TOLERANCE criteria it occasionally fails due computation error at 1e-7 precision. This can be tweaked in the next patch. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch You can use the postings when the cell is wholly contained within the polygon, which wasn't in that last patch. New patch attached to include this logic. Boundary cells are still computed as they relate to the bbox. This could be improved by computing boundary cells as they relate to the shape as long as computing the existence of an intersection of the bbox and shape is fast - this is usually the Achilles heel of spatial relations. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch Thanks Mike! Not sure how I missed my own test. :) Trivial fix though, new patch added and all tests are passing on my end. Next iteration ready for review. This should be ready for sandbox commit blessings. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch Updated patch to fix false negatives. This now improves performance of [LUCENE-6450|https://issues.apache.org/jira/browse/LUCENE-6450] to 0.02 sec / query by using the postings list instead of visiting every term. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch Minor patch update that adds geodesic to geodetic projection / reprojection methods to GeoUtils. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481_WIP.patch First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so this is definitely not ready. So far I've been unable to reproduce the false negatives...I put it here for iterating improvements. *GeoPointField* Index Time: 640.24 sec Index Size: 4.4G Mean Query Time: 0.02 sec Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6481: --- Attachment: LUCENE-6481.patch The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. I've attached a new patch with the correction. testRandomTiny passes but there is one failure in testRandom with the following: {noformat} ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true {noformat} {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom [junit4] Throwable #1: java.lang.AssertionError: id=632 docID=613 lat=46.19240875459866 lon=143.92476891121902 expected true but got: false deleted?=false [junit4]at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0) [junit4]at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302) [junit4]at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204) [junit4]at org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130) [junit4]at java.lang.Thread.run(Thread.java:745) {noformat} This should be enough to debug the issue. I expect to have a new patch sometime tomorrow or before weeks end. Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6481) Improve GeoPointField type to only visit high precision boundary terms
[ https://issues.apache.org/jira/browse/LUCENE-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-6481: --- Attachment: LUCENE-6481.patch New patch, starting from [~nknize]'s and then folding in the evilish random test I added for LUCENE-6477 ... maybe this can help debug why there are false negatives? E.g. with this patch when I run: {noformat} ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true {noformat} It fails with this: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny [junit4] Throwable #1: java.lang.AssertionError: id=0 docID=0 lat=-27.18027939545 lon=-167.14191331870592 expected true but got: false deleted?=false [junit4]at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0) [junit4]at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301) [junit4]at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203) [junit4]at org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125) [junit4]at java.lang.Thread.run(Thread.java:745) {noformat} The test case should be easy-ish to debug: it only indexes at most a few 10s of points... Improve GeoPointField type to only visit high precision boundary terms --- Key: LUCENE-6481 URL: https://issues.apache.org/jira/browse/LUCENE-6481 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Nicholas Knize Attachments: LUCENE-6481.patch, LUCENE-6481_WIP.patch Current GeoPointField [LUCENE-6450 | https://issues.apache.org/jira/browse/LUCENE-6450] computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large). This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box. A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org