[Impala-ASF-CR] WIP: port critical geospatial functions to c++

Daniel Becker (Code Review) Mon, 30 Oct 2023 08:50:59 -0700

Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20602 )


Change subject: WIP: port critical geospatial functions to c++
......................................................................


Patch Set 2:

(172 comments)

I didn't pay much attention to the tests, but this review is already huge.

http://gerrit.cloudera.org:8080/#/c/20602/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20602/2//COMMIT_MSG@1
PS2, Line 1: Parent:     fb2d2b27 (IMPALA-12430: Skip compression when sending 
row batches within same process)
Many functions return a bool that indicates whether reading/writing was 
successful. The meaning of the return value could be documented somewhere, 
either at each function or in some readme file.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/common/global-flags.cc@425
PS2, Line 425: +
Nit: add space after +.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h
File be/src/exprs/geo/common.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@39
PS2, Line 39: // TODO: This file is included in impala-ir.cc, and these 
typedefs will be visible in the
This TODO can probably be deleted now.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@71
PS2, Line 71: OgcTypeToWktPrefix
For other constants we used UPPER_SNAKE_CASE (see L31-35), maybe we should use 
it for this and on L61 as well.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@71
PS2, Line 71: Ogc
On L61 we used OGC in capitals. Either one is good but it should be consistent.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@82
PS2, Line 82: T readFromGeom(const StringVal& geom, int offset) {
Could add some explanation for this and writeToGeom().


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@84
PS2, Line 84: T*
Could be 'const T*' to emphasise that we're not modifying it.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@106
PS2, Line 106: inline OGCType getOGCType(const StringVal& geom) {
Could add a DCHECK that 'OGC_TYPE_OFFSET' is within geom.len.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@108
PS2, Line 108: char
Optional: we could set the underlying type of the 'OGCType' enum to 'char' or 
'unsigned char' and then we could use 'OGCType' as the template argument here. 
We do this with 'EsriType' in shape-format.h.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@128
PS2, Line 128: inline void setOGCType(StringVal& geom, OGCType ogc_type) {
Could add 'static_assert(OGC_TYPE_SIZE == sizeof(char));'
Could add a DCHECK that 'OGC_TYPE_OFFSET' is within geom.len.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@129
PS2, Line 129: char
See L108.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@134
PS2, Line 134: double
Optional, not sure if worth it: we could use decltype(point.x()) instead of 
'double', and static_assert that point.y() has the same type.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h
File be/src/exprs/geo/formatted-double.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h@30
PS2, Line 30: a
Nit: consistEnt


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h@47
PS2, Line 47: 2.6666666666666665
This value is the same as the one on L43. Could you add a short clarification 
why precision 17 is not good?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h@78
PS2, Line 78:   FormattedDouble& operator/=(const FormattedDouble& rhs) {
Consider adding the remaining modification assignment operators for 
completeness:

operator-=()
operator*=()
operator%=()

operator&=()
operator|=()
operator^=()

operator>>=()
operator<<=()

operator++()    - pre-increment
operator++(int) - post-increment
operator--()    - pre-decrement
operator--(int) - post-decrement

They are not needed now but may be needed in the future.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h
File be/src/exprs/geo/geometry-wrapper.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@27
PS2, Line 27: #include "common/names.h"
"common" should come before "exprs".


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@32
PS2, Line 32:
Nit: unnecessary empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@33
PS2, Line 33: class RelationWrapper;
If it's declared as a friend class I don't think forward declaring it here is 
necessary.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@35
PS2, Line 35: types
It could be extended a bit, e.g. 'The types are taken from here: ...'.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@37
PS2, Line 37: GeometryWrapper
You could add some description about what this class does, how it should be 
used.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@38
PS2, Line 38:
Nit: unnecessary empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@39
PS2, Line 39: public
Nit: add 1 space.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@40
PS2, Line 40:
Nit: unnecessary empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@41
PS2, Line 41: d
Capitalise the 'D'. Alternatively, write "serialization and deserialization 
functions."


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@42
PS2, Line 42: above
Now there are no members above, we should explain what members could be 
overwritten.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@42
PS2, Line 42: thwe
Nit: the.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@42
PS2, Line 42: on
Nit: one.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@45
PS2, Line 45: FromEsriBinary
Add in a comment what the return value means. Also where the read result will 
be and in what format, although this could also be explained in the class 
comment.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@46
PS2, Line 46: StringVal
I'm wondering whether it would be better to have "ToEsriBinary()" and "ToWkt()" 
return the same type. Does the usage pattern make it better for the former to 
return StringValue and the latter to return string? If so, it could be 
mentioned in a comment.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@49
PS2, Line 49: WKT
WKT should be expanded at least once to "Well Known Text", preferably in the 
class comment.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@51
PS2, Line 51: StringVal
Could be a const ref.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@57
PS2, Line 57: num_coords
It would be good do make it clear whether this refers to the number of 
coordinate pairs or individual coordinates.

Also, can't we take a vector<DoubleVal> instead of an array?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@63
PS2, Line 63:   point2d point_;
We could consider using a union or std::variant 
(https://en.cppreference.com/w/cpp/utility/variant). If the type was included, 
that could be used as the discriminant of the union. We'd have to be careful 
with initialisation though.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc
File be/src/exprs/geo/geometry-wrapper.cc:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@28
PS2, Line 28: ogcType
The 'geom' already contains type info, can we add a DCHECK checking that that 
type info and 'ogcType' match?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@31
PS2, Line 31: this->
We didn't use the explicit 'this' pointer in the other cases, so we shouldn't 
do it here either.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@50
PS2, Line 50: not supported
Could include the function name and value of 'ogcType' that is not supported, 
like on L116.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@53
PS2, Line 53:   return true;
This return statement could be moved to "case ST_POINT" so that all branches 
return.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@56
PS2, Line 56: srid
This could alternatively be a default parameter instead of having 2 overloads. 
On L65 we already use 0 as a default value.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@63
PS2, Line 63: ogcType
Here we can't check whether the geometry member that is filled in is actually 
the one that corresponds to 'ogcType'. 'ogcType' could be included in 
'GeometryWrapper'.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@76
PS2, Line 76:     default:
Shouldn't we add braces { ... }?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@77
PS2, Line 77: not supported.
Could include the function name and value of 'ogcType' that is not supported, 
like on L116.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@86
PS2, Line 86:     //ss.precision(15);
Remove unused code.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@102
PS2, Line 102:         //ReverseRings(this->polygon);
Remove unused code.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@103
PS2, Line 103:         std::reverse(polygon_.outer().begin(), 
polygon_.outer().end());
What if we call this function twice, the second call will transform the order 
back to the original one.

Also, if we later use the polygon with other Esri functions, will it work?

If it's OK to call this function more than once, we should copy the polygon 
here, otherwise we should write it in a comment that it can only be used once 
and we could move the polygon to a temporary variable.

Also, can we have an inner ring? Does that need to be reversed?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@110
PS2, Line 110:           std::reverse(polygon.outer().begin(), 
polygon.outer().end());
See L103.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@133
PS2, Line 133: bool GeometryWrapper::FromWkt(FunctionContext* ctx, StringVal 
wkt, OGCType ogcType) {
Should we handle null values? If not, add DCHECK(!wkt.is_null).


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@160
PS2, Line 160: coords
Can we take a vector instead?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@160
PS2, Line 160: &
Usually we take out parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@187
PS2, Line 187: variable argument
It's not really vararg, just taking a pointer to an array. I think it could be 
confused with https://en.cppreference.com/w/cpp/utility/variadic.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@190
PS2, Line 190:       boost::geometry::correct(polygon_);
Do we know if it can throw?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@196
PS2, Line 196: not supported.
Could include the function name and value of 'ogcType' that is not supported, 
like on L116.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc
File be/src/exprs/geo/geospatial-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@26
PS2, Line 26: #include "exprs/geo/common.h"
            : #include "exprs/geo/shape-format.h"
            : #include "exprs/geo/poly-line-shape-format.h"
            : #include "exprs/geo/multi-point-shape-format.h"
            : #include "exprs/geo/geometry-wrapper.h"
            : #include "exprs/geo/relation-wrapper.h"
            : #include "exprs/geo/utils.h"
            : #include "exprs/geo/wkt.h"
These should be in alphabetical order.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@99
PS2, Line 99:   if (srid.is_null) return geom;
Now if 'geom' is invalid and 'srid' is null, we return the invalid 'geom', but 
if 'srid' is non-null, we return null in this case. Alternatively, we could 
always check whether 'geom' is valid and return null if not.

We could check what other query engines do or what the spec says (if it exists).


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@130
PS2, Line 130: // Variable argument list constructor.
It's not really vararg, just taking a pointer to an array. I think it could be 
confused with https://en.cppreference.com/w/cpp/utility/variadic.
It could be clarified that it's vararg as an Impala UDF.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@149
PS2, Line 149: // Variable argument list constructor.
See L130.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@162
PS2, Line 162:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@169
PS2, Line 169: // Variable argument list constructor.
See L130.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@218
PS2, Line 218: /*
Remove dead code.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@345
PS2, Line 345:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@357
PS2, Line 357: BigIntVal GeospatialFunctions::st_BinGeom(FunctionContext* ctx, 
const DoubleVal& bin_size,
The two st_BinGeom() implementations are the same, only the type of the 
'bin_size' parameter is different. The functions could call a common function 
template.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@359
PS2, Line 359:   if (bin_size.is_null) return BigIntVal::null();
See L99.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@377
PS2, Line 377: BigIntVal GeospatialFunctions::st_BinWkt(FunctionContext* ctx, 
const DoubleVal& bin_size,
See L357.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@393
PS2, Line 393: StringVal 
GeospatialFunctions::st_BinenvelopeBinId(FunctionContext* ctx, const DoubleVal& 
bin_size,
> line too long (99 > 90)
See L357.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@414
PS2, Line 414: StringVal 
GeospatialFunctions::st_BinenvelopeGeom(FunctionContext* ctx, const DoubleVal& 
bin_size,
> line too long (98 > 90)
See L357.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@439
PS2, Line 439: StringVal 
GeospatialFunctions::st_BinenvelopeWkt(FunctionContext* ctx, const DoubleVal& 
bin_size,
> line too long (97 > 90)
See L357.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h
File be/src/exprs/geo/geospatial-functions.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h@39
PS2, Line 39: GeospatialFunctions
Could add a link to a page where standard geospatial functions are described 
(if there is a good one).


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h@107
PS2, Line 107:
Unneeded spaces.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h@130
PS2, Line 130:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.h
File be/src/exprs/geo/multi-point-shape-format.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.h@33
PS2, Line 33:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.h@34
PS2, Line 34: public:
Add 1 space.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc
File be/src/exprs/geo/multi-point-shape-format.cc:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@23
PS2, Line 23: &
We usually take output parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@25
PS2, Line 25:     ctx->SetError("Multipoint lenght too small.");
It might be misleading, a user could think that it is the value stored at 
NUM_POINTS_OFFSET that is too small. Maybe we could write something like "Size 
of multipoint object too small", or something like on L30.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@29
PS2, Line 29: 16
This could be extracted to a constexpr (for example as 2 * sizeof(double)) and 
also used at L35.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@37
PS2, Line 37: 8
Could be sizeof(double).


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@50
PS2, Line 50: 16
Could use the same constant as I recommended at L29.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@59
PS2, Line 59: 16
See L50.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@60
PS2, Line 60: 16 + 8
See L50 and L37.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h
File be/src/exprs/geo/poly-line-shape-format.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@40
PS2, Line 40: &
We usually take out parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@43
PS2, Line 43: public:
Add 1 space.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@44
PS2, Line 44: Read
Add a comment about which out parameters should be used in which case.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@48
PS2, Line 48:   //static int WriteNextPart(StringValconst vector<point2d>& 
points, int* point_offset);
Dead code.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc
File be/src/exprs/geo/poly-line-shape-format.cc:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@31
PS2, Line 31: .
And also polygon?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@33
PS2, Line 33:   // will be deserialized to that shape.
What about 'rings'? Aren't all three mutually exclusive?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@43
PS2, Line 43:     DCHECK_NE(linestring, nullptr);
Why not also DCHECK(rings == nullptr)?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@47
PS2, Line 47:     ctx->SetError("PolyLine lenght too small.");
It might be misleading, a user could think that it is the value stored at 
NUM_PARTS_OFFSET that is too small. Maybe we could write something like "Size 
of polyline object too small", or something like on L59.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@57
PS2, Line 57: 4
Could extract to constexpr.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@58
PS2, Line 58: 16
Could extract to constexpr.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@68
PS2, Line 68:   }
Shouldn't we also clear 'rings' if 'is_polygon' is true?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@71
PS2, Line 71: 4
See L57.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@72
PS2, Line 72: first_point
'first_point_idx' or something similar would be better, this name could be 
mistaken for the actual point value. Similarly for 'last_point' on L75.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@78
PS2, Line 78:         || last_point >= num_points) {
Shouldn't we add "|| first_point > last_point"?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@90
PS2, Line 90: 16
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@92
PS2, Line 92: 8
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@106
PS2, Line 106: &
We usually take out parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@108
PS2, Line 108: 16
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@109
PS2, Line 109: 16 + 8
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@116
PS2, Line 116: 16
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@116
PS2, Line 116: 4
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@156
PS2, Line 156:   }
Shouldn't we check that 'num_points' is not 0?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@163
PS2, Line 163: 4
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@164
PS2, Line 164: 16
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@179
PS2, Line 179: &
We usually take out parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@181
PS2, Line 181: 4
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@188
PS2, Line 188: 16
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@203
PS2, Line 203: 4
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@219
PS2, Line 219: 4
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@233
PS2, Line 233: 4
See L58.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h
File be/src/exprs/geo/relation-wrapper.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@31
PS2, Line 31:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@32
PS2, Line 32: RelationWrapper
Could add a class doc comment, describing what the functions do.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@32
PS2, Line 32: class RelationWrapper {
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@34
PS2, Line 34: public:
Add 1 space.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@51
PS2, Line 51: private:
Add 1 space.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@67
PS2, Line 67: //  - st_disjoint can't use bbox check (or rtree) for early 
filtering
Why not? If the bboxes are disjoint, the geoms are disjoint, aren't they?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@87
PS2, Line 87:   static constexpr bool RESULT_IF_BBOX_DOES_NOT_INTERSECT = true;
This only shadows RelationPredicate::RESULT_IF_BBOX_DOES_NOT_INTERSECT, does 
not override it. Using a constexpr function would be more elegant.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc
File be/src/exprs/geo/relation-wrapper.cc:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@25
PS2, Line 25:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@51
PS2, Line 51: GetTypeFromWkt
We should check if GetTypeFromWkt returns UNKNOWN.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@68
PS2, Line 68:   RelationWrapper rel;
Couldn't this function simply call EvalBinWkt() with lhs and rhs swapped?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@69
PS2, Line 69: GetTypeFromWkt
We should check if GetTypeFromWkt returns UNKNOWN.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@87
PS2, Line 87: GetTypeFromWkt
We should check if GetTypeFromWkt returns UNKNOWN.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@88
PS2, Line 88: GetTypeFromWkt
We should check if GetTypeFromWkt returns UNKNOWN.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@98
PS2, Line 98: // 2 function per predicate to turn 2 OGCType parameters to 
templated parameters with
Could you clarify this comment?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@109
PS2, Line 109:
Empty line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@126
PS2, Line 126: bool IntersectsPredicate::Eval(const lhs_geometry_t& lhs, const 
rhs_geometry_t& rhs) {
This should be grouped together with the specialisations of 
IntersectsPredicate::Eval.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@156
PS2, Line 156: // select c1.name, count(*) num_neighbours2
Does this query actually return the number of second-level neighbours? Wouldn't 
for example both of these rows be counted?

Hungary Slovakia Poland
Hungary Ukraine Poland


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@173
PS2, Line 173: typedef
'using' would be more readable, line on the next line.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@176
PS2, Line 176: multi_polygon2d
Could be a (const) reference.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@185
PS2, Line 185: mbr
What does 'm' stand for in the name?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@187
PS2, Line 187: it
This could be the initialisation in the loop definition (before the first 
semicolon).


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@199
PS2, Line 199:   // Build rtree from smaller multi polygon to reduce log(n) 
part.
Shouldn't we do it the other way around? We should reduce the linear part more 
than the log(n) part.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@210
PS2, Line 210: lhs
The param names could indicate which one is a simple and which one is a 
multi-polygon. Same for L217.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@224
PS2, Line 224: FunctionContext* ctx
Unused parameter.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@275
PS2, Line 275: #define DEFINE_RELATION_PREDICATE(relation_name)                 
             \
If the class 'RelationWrapper' was templated instead of the individual 
functions, it would be enough to explicitly instantiate the class for each 
predicate, the functions would automatically be instantiated. This macro 
wouldn't be needed then.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h
File be/src/exprs/geo/shape-format.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@29
PS2, Line 29: constexpr int X1_OFFSET = 9;
Instead of X1_OFFSET it could be MIN_X_OFFSET,  similarly for MAX and Y.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@34
PS2, Line 34: MIN_GEOM_SIZE
It's not actually the minimal geometry size, rather the header size isn't it?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@35
PS2, Line 35: MIN_POINT_SIZE
This is not only the minimal point size but all points have this size, don't 
they?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@74
PS2, Line 74: OGCTypeToEsriType
For other constants we used UPPER_SNAKE_CASE (see L26-36), maybe we should use 
it for this as well.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@85
PS2, Line 85:   static_assert(ESRI_TYPE_SIZE == sizeof(EsriType));
Instead of (or in addition to) this static assert, we could have 
static_assert(ESRI_TYPE_SIZE == sizeof(uint32_t)) before the EsriType enum 
definition on L46.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@86
PS2, Line 86:   return readFromGeom<EsriType>(geom, ESRI_TYPE_OFFSET);
Could add a DCHECK that 'geom.len' is large enough.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@89
PS2, Line 89: inline double getMinX(const StringVal& geom) {
For these getters and setters we could add DCHECKs that 'geom.len' is large 
enough.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@106
PS2, Line 106:   static_assert(ESRI_TYPE_SIZE == sizeof(EsriType));
See L85.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@130
PS2, Line 130:   if (geom.is_null) return false;
Should we set (*ogc_type) to UNKNOWN in this case?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@177
PS2, Line 177: &
We usually take out parameters by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@188
PS2, Line 188: srid = 0
As far as I've seen we don't use the default value, so we could make it an 
obligatory argument to be explicit. Alternatively we could remove specifying 0 
when it is only a default value at the call sites.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@201
PS2, Line 201: lhs
I don't think 'lhs' is needed here.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@213
PS2, Line 213: StringVal
Could be a reference.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@216
PS2, Line 216: 1
Instead of '1' we could have 'lhs'. Similarly for 'rhs'.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h
File be/src/exprs/geo/utils.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@97
PS2, Line 97:   DCHECK_GE(ring.size(), 2);
What about the edge case when there are two identical points? Does it count as 
a closed ring? In that case, arguably, a single point could also be considered 
a closed ring.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@111
PS2, Line 111:   double bottomleft_x = ring[0].x();
I'm not sure 'bottomleft' is a good name because the orientation of the axes is 
not defined, or is it defined?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@148
PS2, Line 148: &
We usually take non-const values by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@153
PS2, Line 153:   polygon.clear();
It is a bit inconsistent that we don't clear the ring if the rings are empty 
but clear it in case of other errors.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@156
PS2, Line 156:   for (int i = 0; i < rings.size(); i++) {
We're not using 'i' other than indexing 'rings', so it could be a range-based 
for loop.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@161
PS2, Line 161: rings
Nit: I think it should be singular, i.e. "more than one outer ring".


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@168
PS2, Line 168:       std::swap(polygon.inners().back(), rings[i]);
Couldn't this be achieved by
polygon.inners().emplace_back(std::move(rings[i]));
?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@175
PS2, Line 175: &
We usually take non-const values by pointer.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@180
PS2, Line 180:   mpolygon.clear();
See L153.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@183
PS2, Line 183: rings[i]
Could extract it to a variable.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@185
PS2, Line 185: conunter clockwise
Nit: counter-clockwise with a hyphen?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@195
PS2, Line 195:       std::swap(mpolygon.back().inners().back(), rings[i]);
Couldn't this be achieved by
mpolygon.back().inners().emplace_back(std::move(rings[i]));
?


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h
File be/src/exprs/geo/wkt.h:

http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@29
PS2, Line 29: bool
Do we ever return false? Shouldn't we convert a possible exception from boost 
into 'false'? Or if we want to catch the exception later, the function could 
return void.

Same for the specialisations.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@29
PS2, Line 29: &
We usually take non-const values by pointer. Same for the specialisations.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@30
PS2, Line 30:
Too much indentation.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@36
PS2, Line 36:
Too much indentation.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@44
PS2, Line 44:
Too much indentation.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@66
PS2, Line 66: std::string
Could be const.


http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@81
PS2, Line 81: spaces
Could we also have tabs and other whitespace characters that we should skip? 
See also L84.


http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
File 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java:

http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@97
PS2, Line 97:     // TODO: this worked well for cases when the Java UDF had 
only one overload
In these cases we now override all overloads, right? We should update this TODO 
to reflect that.


http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@264
PS2, Line 264:   // BINARY, BINARY / BINARY, STRING / STRING, BINARY / STRING, 
STRING
Could make it clear that these are the parameter combinations.
Also, could put the different combinations on separate lines.


http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@285
PS2, Line 285:       addNative(db, "st_Point", false, Type.BINARY, Type.STRING);
Nit: for the others, the STRING-arg version comes first, it could be the same 
for "st_Point" as well.


http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@296
PS2, Line 296: new ArrayList
No need to construct an ArrayList, the list returned from Arrays.asList() would 
be enough.


http://gerrit.cloudera.org:8080/#/c/20602/2/tests/custom_cluster/test_geospatial_library.py
File tests/custom_cluster/test_geospatial_library.py:

http://gerrit.cloudera.org:8080/#/c/20602/2/tests/custom_cluster/test_geospatial_library.py@48
PS2, Line 48:     assert ST_POINT_SIGNATURE_BUILTIN in result.data
Could also assert that the Java version is not present.


http://gerrit.cloudera.org:8080/#/c/20602/2/tests/custom_cluster/test_geospatial_library.py@54
PS2, Line 54:     assert ST_POINT_SIGNATURE_JAVA in result.data
Could also assert that the C++ version is not present.


http://gerrit.cloudera.org:8080/#/c/20602/2/tests/query_test/test_geospatial_functions.py
File tests/query_test/test_geospatial_functions.py:

http://gerrit.cloudera.org:8080/#/c/20602/2/tests/query_test/test_geospatial_functions.py@32
PS2, Line 32: do not use tables
It is no longer true, we use table 'geospatial_higher_dim' in 
geospatial-esri-native-higher-dimension.test.



--
To view, visit http://gerrit.cloudera.org:8080/20602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I692ebc13617c37d61f77fffb09e872095a0fd11c
Gerrit-Change-Number: 20602
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Mon, 30 Oct 2023 15:46:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: port critical geospatial functions to c++

Reply via email to