[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding
[ https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974254#comment-16974254 ] Ignacio Vera commented on LUCENE-8997: -- I see your point, I revert that change. > Add type of triangle info to ShapeField encoding > > > Key: LUCENE-8997 > URL: https://issues.apache.org/jira/browse/LUCENE-8997 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We are currently encoding three type of triangle in ShapeField: > * POINT: all three coordinates are equal > * LINE: two coordinates are equal > * TRIANGLE: all coordinates are different > Because we still have two unused bits, it might be worthy to encode this > information in those two bits as follows: > * 0 0 : Unknown so this is an index created before adding this information. > We can compute in this case the information while decoding for backwards > compatibility. > * 1 0: The encoded triangle is a POINT > * 0 1: The encoded triangle is a LINE > * 1 1: The encoded triangle is a TRIANGLE > We can later leverage this information so we don't need to decode all > dimensions in case of POINT and LINE and we are currently computing in some > of the methods ithe type of triangle we are dealing with, This will go as > well. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding
[ https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974241#comment-16974241 ] Adrien Grand commented on LUCENE-8997: -- I guess it could still work if we indexed this dimension, but I don't think this is the right trade-off. > Add type of triangle info to ShapeField encoding > > > Key: LUCENE-8997 > URL: https://issues.apache.org/jira/browse/LUCENE-8997 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We are currently encoding three type of triangle in ShapeField: > * POINT: all three coordinates are equal > * LINE: two coordinates are equal > * TRIANGLE: all coordinates are different > Because we still have two unused bits, it might be worthy to encode this > information in those two bits as follows: > * 0 0 : Unknown so this is an index created before adding this information. > We can compute in this case the information while decoding for backwards > compatibility. > * 1 0: The encoded triangle is a POINT > * 0 1: The encoded triangle is a LINE > * 1 1: The encoded triangle is a TRIANGLE > We can later leverage this information so we don't need to decode all > dimensions in case of POINT and LINE and we are currently computing in some > of the methods ithe type of triangle we are dealing with, This will go as > well. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding
[ https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974239#comment-16974239 ] Adrien Grand commented on LUCENE-8997: -- I'm unsure about keeping dimensions empty: it works well if your index has only lines or only points since all points will have a value of 0 for certain dimensions. But if the index mixes triangles and points, then this could actually hurt? > Add type of triangle info to ShapeField encoding > > > Key: LUCENE-8997 > URL: https://issues.apache.org/jira/browse/LUCENE-8997 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We are currently encoding three type of triangle in ShapeField: > * POINT: all three coordinates are equal > * LINE: two coordinates are equal > * TRIANGLE: all coordinates are different > Because we still have two unused bits, it might be worthy to encode this > information in those two bits as follows: > * 0 0 : Unknown so this is an index created before adding this information. > We can compute in this case the information while decoding for backwards > compatibility. > * 1 0: The encoded triangle is a POINT > * 0 1: The encoded triangle is a LINE > * 1 1: The encoded triangle is a TRIANGLE > We can later leverage this information so we don't need to decode all > dimensions in case of POINT and LINE and we are currently computing in some > of the methods ithe type of triangle we are dealing with, This will go as > well. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding
[ https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974215#comment-16974215 ] Ignacio Vera commented on LUCENE-8997: -- I would like to raise this issue again as I make a small improvement. I realise that for points I do not need to add the point information for data dimensions, therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that only contain points it means they will compress very well. I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of the index size of 30%! {code} ||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader heap (MB)|| ||Dev||Base||Diff ||Dev ||Base ||diff ||Dev||Base||Diff||Dev||Base||Diff || |shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%| {code} > Add type of triangle info to ShapeField encoding > > > Key: LUCENE-8997 > URL: https://issues.apache.org/jira/browse/LUCENE-8997 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We are currently encoding three type of triangle in ShapeField: > * POINT: all three coordinates are equal > * LINE: two coordinates are equal > * TRIANGLE: all coordinates are different > Because we still have two unused bits, it might be worthy to encode this > information in those two bits as follows: > * 0 0 : Unknown so this is an index created before adding this information. > We can compute in this case the information while decoding for backwards > compatibility. > * 1 0: The encoded triangle is a POINT > * 0 1: The encoded triangle is a LINE > * 1 1: The encoded triangle is a TRIANGLE > We can later leverage this information so we don't need to decode all > dimensions in case of POINT and LINE and we are currently computing in some > of the methods ithe type of triangle we are dealing with, This will go as > well. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org