Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20602 )
Change subject: WIP: port critical geospatial functions to c++ ...................................................................... Patch Set 2: (172 comments) I didn't pay much attention to the tests, but this review is already huge. http://gerrit.cloudera.org:8080/#/c/20602/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20602/2//COMMIT_MSG@1 PS2, Line 1: Parent: fb2d2b27 (IMPALA-12430: Skip compression when sending row batches within same process) Many functions return a bool that indicates whether reading/writing was successful. The meaning of the return value could be documented somewhere, either at each function or in some readme file. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/common/global-flags.cc@425 PS2, Line 425: + Nit: add space after +. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h File be/src/exprs/geo/common.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@39 PS2, Line 39: // TODO: This file is included in impala-ir.cc, and these typedefs will be visible in the This TODO can probably be deleted now. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@71 PS2, Line 71: OgcTypeToWktPrefix For other constants we used UPPER_SNAKE_CASE (see L31-35), maybe we should use it for this and on L61 as well. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@71 PS2, Line 71: Ogc On L61 we used OGC in capitals. Either one is good but it should be consistent. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@82 PS2, Line 82: T readFromGeom(const StringVal& geom, int offset) { Could add some explanation for this and writeToGeom(). http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@84 PS2, Line 84: T* Could be 'const T*' to emphasise that we're not modifying it. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@106 PS2, Line 106: inline OGCType getOGCType(const StringVal& geom) { Could add a DCHECK that 'OGC_TYPE_OFFSET' is within geom.len. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@108 PS2, Line 108: char Optional: we could set the underlying type of the 'OGCType' enum to 'char' or 'unsigned char' and then we could use 'OGCType' as the template argument here. We do this with 'EsriType' in shape-format.h. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@128 PS2, Line 128: inline void setOGCType(StringVal& geom, OGCType ogc_type) { Could add 'static_assert(OGC_TYPE_SIZE == sizeof(char));' Could add a DCHECK that 'OGC_TYPE_OFFSET' is within geom.len. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@129 PS2, Line 129: char See L108. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/common.h@134 PS2, Line 134: double Optional, not sure if worth it: we could use decltype(point.x()) instead of 'double', and static_assert that point.y() has the same type. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h File be/src/exprs/geo/formatted-double.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h@30 PS2, Line 30: a Nit: consistEnt http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h@47 PS2, Line 47: 2.6666666666666665 This value is the same as the one on L43. Could you add a short clarification why precision 17 is not good? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/formatted-double.h@78 PS2, Line 78: FormattedDouble& operator/=(const FormattedDouble& rhs) { Consider adding the remaining modification assignment operators for completeness: operator-=() operator*=() operator%=() operator&=() operator|=() operator^=() operator>>=() operator<<=() operator++() - pre-increment operator++(int) - post-increment operator--() - pre-decrement operator--(int) - post-decrement They are not needed now but may be needed in the future. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h File be/src/exprs/geo/geometry-wrapper.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@27 PS2, Line 27: #include "common/names.h" "common" should come before "exprs". http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@32 PS2, Line 32: Nit: unnecessary empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@33 PS2, Line 33: class RelationWrapper; If it's declared as a friend class I don't think forward declaring it here is necessary. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@35 PS2, Line 35: types It could be extended a bit, e.g. 'The types are taken from here: ...'. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@37 PS2, Line 37: GeometryWrapper You could add some description about what this class does, how it should be used. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@38 PS2, Line 38: Nit: unnecessary empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@39 PS2, Line 39: public Nit: add 1 space. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@40 PS2, Line 40: Nit: unnecessary empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@41 PS2, Line 41: d Capitalise the 'D'. Alternatively, write "serialization and deserialization functions." http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@42 PS2, Line 42: above Now there are no members above, we should explain what members could be overwritten. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@42 PS2, Line 42: thwe Nit: the. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@42 PS2, Line 42: on Nit: one. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@45 PS2, Line 45: FromEsriBinary Add in a comment what the return value means. Also where the read result will be and in what format, although this could also be explained in the class comment. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@46 PS2, Line 46: StringVal I'm wondering whether it would be better to have "ToEsriBinary()" and "ToWkt()" return the same type. Does the usage pattern make it better for the former to return StringValue and the latter to return string? If so, it could be mentioned in a comment. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@49 PS2, Line 49: WKT WKT should be expanded at least once to "Well Known Text", preferably in the class comment. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@51 PS2, Line 51: StringVal Could be a const ref. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@57 PS2, Line 57: num_coords It would be good do make it clear whether this refers to the number of coordinate pairs or individual coordinates. Also, can't we take a vector<DoubleVal> instead of an array? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.h@63 PS2, Line 63: point2d point_; We could consider using a union or std::variant (https://en.cppreference.com/w/cpp/utility/variant). If the type was included, that could be used as the discriminant of the union. We'd have to be careful with initialisation though. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc File be/src/exprs/geo/geometry-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@28 PS2, Line 28: ogcType The 'geom' already contains type info, can we add a DCHECK checking that that type info and 'ogcType' match? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@31 PS2, Line 31: this-> We didn't use the explicit 'this' pointer in the other cases, so we shouldn't do it here either. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@50 PS2, Line 50: not supported Could include the function name and value of 'ogcType' that is not supported, like on L116. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@53 PS2, Line 53: return true; This return statement could be moved to "case ST_POINT" so that all branches return. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@56 PS2, Line 56: srid This could alternatively be a default parameter instead of having 2 overloads. On L65 we already use 0 as a default value. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@63 PS2, Line 63: ogcType Here we can't check whether the geometry member that is filled in is actually the one that corresponds to 'ogcType'. 'ogcType' could be included in 'GeometryWrapper'. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@76 PS2, Line 76: default: Shouldn't we add braces { ... }? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@77 PS2, Line 77: not supported. Could include the function name and value of 'ogcType' that is not supported, like on L116. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@86 PS2, Line 86: //ss.precision(15); Remove unused code. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@102 PS2, Line 102: //ReverseRings(this->polygon); Remove unused code. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@103 PS2, Line 103: std::reverse(polygon_.outer().begin(), polygon_.outer().end()); What if we call this function twice, the second call will transform the order back to the original one. Also, if we later use the polygon with other Esri functions, will it work? If it's OK to call this function more than once, we should copy the polygon here, otherwise we should write it in a comment that it can only be used once and we could move the polygon to a temporary variable. Also, can we have an inner ring? Does that need to be reversed? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@110 PS2, Line 110: std::reverse(polygon.outer().begin(), polygon.outer().end()); See L103. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@133 PS2, Line 133: bool GeometryWrapper::FromWkt(FunctionContext* ctx, StringVal wkt, OGCType ogcType) { Should we handle null values? If not, add DCHECK(!wkt.is_null). http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@160 PS2, Line 160: coords Can we take a vector instead? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@160 PS2, Line 160: & Usually we take out parameters by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@187 PS2, Line 187: variable argument It's not really vararg, just taking a pointer to an array. I think it could be confused with https://en.cppreference.com/w/cpp/utility/variadic. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@190 PS2, Line 190: boost::geometry::correct(polygon_); Do we know if it can throw? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geometry-wrapper.cc@196 PS2, Line 196: not supported. Could include the function name and value of 'ogcType' that is not supported, like on L116. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc File be/src/exprs/geo/geospatial-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@26 PS2, Line 26: #include "exprs/geo/common.h" : #include "exprs/geo/shape-format.h" : #include "exprs/geo/poly-line-shape-format.h" : #include "exprs/geo/multi-point-shape-format.h" : #include "exprs/geo/geometry-wrapper.h" : #include "exprs/geo/relation-wrapper.h" : #include "exprs/geo/utils.h" : #include "exprs/geo/wkt.h" These should be in alphabetical order. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@99 PS2, Line 99: if (srid.is_null) return geom; Now if 'geom' is invalid and 'srid' is null, we return the invalid 'geom', but if 'srid' is non-null, we return null in this case. Alternatively, we could always check whether 'geom' is valid and return null if not. We could check what other query engines do or what the spec says (if it exists). http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@130 PS2, Line 130: // Variable argument list constructor. It's not really vararg, just taking a pointer to an array. I think it could be confused with https://en.cppreference.com/w/cpp/utility/variadic. It could be clarified that it's vararg as an Impala UDF. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@149 PS2, Line 149: // Variable argument list constructor. See L130. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@162 PS2, Line 162: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@169 PS2, Line 169: // Variable argument list constructor. See L130. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@218 PS2, Line 218: /* Remove dead code. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@345 PS2, Line 345: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@357 PS2, Line 357: BigIntVal GeospatialFunctions::st_BinGeom(FunctionContext* ctx, const DoubleVal& bin_size, The two st_BinGeom() implementations are the same, only the type of the 'bin_size' parameter is different. The functions could call a common function template. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@359 PS2, Line 359: if (bin_size.is_null) return BigIntVal::null(); See L99. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@377 PS2, Line 377: BigIntVal GeospatialFunctions::st_BinWkt(FunctionContext* ctx, const DoubleVal& bin_size, See L357. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@393 PS2, Line 393: StringVal GeospatialFunctions::st_BinenvelopeBinId(FunctionContext* ctx, const DoubleVal& bin_size, > line too long (99 > 90) See L357. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@414 PS2, Line 414: StringVal GeospatialFunctions::st_BinenvelopeGeom(FunctionContext* ctx, const DoubleVal& bin_size, > line too long (98 > 90) See L357. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions-ir.cc@439 PS2, Line 439: StringVal GeospatialFunctions::st_BinenvelopeWkt(FunctionContext* ctx, const DoubleVal& bin_size, > line too long (97 > 90) See L357. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h File be/src/exprs/geo/geospatial-functions.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h@39 PS2, Line 39: GeospatialFunctions Could add a link to a page where standard geospatial functions are described (if there is a good one). http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h@107 PS2, Line 107: Unneeded spaces. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/geospatial-functions.h@130 PS2, Line 130: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.h File be/src/exprs/geo/multi-point-shape-format.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.h@33 PS2, Line 33: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.h@34 PS2, Line 34: public: Add 1 space. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc File be/src/exprs/geo/multi-point-shape-format.cc: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@23 PS2, Line 23: & We usually take output parameters by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@25 PS2, Line 25: ctx->SetError("Multipoint lenght too small."); It might be misleading, a user could think that it is the value stored at NUM_POINTS_OFFSET that is too small. Maybe we could write something like "Size of multipoint object too small", or something like on L30. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@29 PS2, Line 29: 16 This could be extracted to a constexpr (for example as 2 * sizeof(double)) and also used at L35. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@37 PS2, Line 37: 8 Could be sizeof(double). http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@50 PS2, Line 50: 16 Could use the same constant as I recommended at L29. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@59 PS2, Line 59: 16 See L50. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/multi-point-shape-format.cc@60 PS2, Line 60: 16 + 8 See L50 and L37. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h File be/src/exprs/geo/poly-line-shape-format.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@40 PS2, Line 40: & We usually take out parameters by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@43 PS2, Line 43: public: Add 1 space. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@44 PS2, Line 44: Read Add a comment about which out parameters should be used in which case. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.h@48 PS2, Line 48: //static int WriteNextPart(StringValconst vector<point2d>& points, int* point_offset); Dead code. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc File be/src/exprs/geo/poly-line-shape-format.cc: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@31 PS2, Line 31: . And also polygon? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@33 PS2, Line 33: // will be deserialized to that shape. What about 'rings'? Aren't all three mutually exclusive? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@43 PS2, Line 43: DCHECK_NE(linestring, nullptr); Why not also DCHECK(rings == nullptr)? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@47 PS2, Line 47: ctx->SetError("PolyLine lenght too small."); It might be misleading, a user could think that it is the value stored at NUM_PARTS_OFFSET that is too small. Maybe we could write something like "Size of polyline object too small", or something like on L59. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@57 PS2, Line 57: 4 Could extract to constexpr. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@58 PS2, Line 58: 16 Could extract to constexpr. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@68 PS2, Line 68: } Shouldn't we also clear 'rings' if 'is_polygon' is true? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@71 PS2, Line 71: 4 See L57. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@72 PS2, Line 72: first_point 'first_point_idx' or something similar would be better, this name could be mistaken for the actual point value. Similarly for 'last_point' on L75. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@78 PS2, Line 78: || last_point >= num_points) { Shouldn't we add "|| first_point > last_point"? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@90 PS2, Line 90: 16 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@92 PS2, Line 92: 8 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@106 PS2, Line 106: & We usually take out parameters by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@108 PS2, Line 108: 16 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@109 PS2, Line 109: 16 + 8 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@116 PS2, Line 116: 16 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@116 PS2, Line 116: 4 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@156 PS2, Line 156: } Shouldn't we check that 'num_points' is not 0? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@163 PS2, Line 163: 4 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@164 PS2, Line 164: 16 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@179 PS2, Line 179: & We usually take out parameters by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@181 PS2, Line 181: 4 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@188 PS2, Line 188: 16 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@203 PS2, Line 203: 4 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@219 PS2, Line 219: 4 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/poly-line-shape-format.cc@233 PS2, Line 233: 4 See L58. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h File be/src/exprs/geo/relation-wrapper.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@31 PS2, Line 31: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@32 PS2, Line 32: RelationWrapper Could add a class doc comment, describing what the functions do. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@32 PS2, Line 32: class RelationWrapper { Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@34 PS2, Line 34: public: Add 1 space. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@51 PS2, Line 51: private: Add 1 space. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@67 PS2, Line 67: // - st_disjoint can't use bbox check (or rtree) for early filtering Why not? If the bboxes are disjoint, the geoms are disjoint, aren't they? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.h@87 PS2, Line 87: static constexpr bool RESULT_IF_BBOX_DOES_NOT_INTERSECT = true; This only shadows RelationPredicate::RESULT_IF_BBOX_DOES_NOT_INTERSECT, does not override it. Using a constexpr function would be more elegant. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc File be/src/exprs/geo/relation-wrapper.cc: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@25 PS2, Line 25: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@51 PS2, Line 51: GetTypeFromWkt We should check if GetTypeFromWkt returns UNKNOWN. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@68 PS2, Line 68: RelationWrapper rel; Couldn't this function simply call EvalBinWkt() with lhs and rhs swapped? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@69 PS2, Line 69: GetTypeFromWkt We should check if GetTypeFromWkt returns UNKNOWN. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@87 PS2, Line 87: GetTypeFromWkt We should check if GetTypeFromWkt returns UNKNOWN. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@88 PS2, Line 88: GetTypeFromWkt We should check if GetTypeFromWkt returns UNKNOWN. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@98 PS2, Line 98: // 2 function per predicate to turn 2 OGCType parameters to templated parameters with Could you clarify this comment? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@109 PS2, Line 109: Empty line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@126 PS2, Line 126: bool IntersectsPredicate::Eval(const lhs_geometry_t& lhs, const rhs_geometry_t& rhs) { This should be grouped together with the specialisations of IntersectsPredicate::Eval. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@156 PS2, Line 156: // select c1.name, count(*) num_neighbours2 Does this query actually return the number of second-level neighbours? Wouldn't for example both of these rows be counted? Hungary Slovakia Poland Hungary Ukraine Poland http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@173 PS2, Line 173: typedef 'using' would be more readable, line on the next line. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@176 PS2, Line 176: multi_polygon2d Could be a (const) reference. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@185 PS2, Line 185: mbr What does 'm' stand for in the name? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@187 PS2, Line 187: it This could be the initialisation in the loop definition (before the first semicolon). http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@199 PS2, Line 199: // Build rtree from smaller multi polygon to reduce log(n) part. Shouldn't we do it the other way around? We should reduce the linear part more than the log(n) part. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@210 PS2, Line 210: lhs The param names could indicate which one is a simple and which one is a multi-polygon. Same for L217. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@224 PS2, Line 224: FunctionContext* ctx Unused parameter. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/relation-wrapper.cc@275 PS2, Line 275: #define DEFINE_RELATION_PREDICATE(relation_name) \ If the class 'RelationWrapper' was templated instead of the individual functions, it would be enough to explicitly instantiate the class for each predicate, the functions would automatically be instantiated. This macro wouldn't be needed then. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h File be/src/exprs/geo/shape-format.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@29 PS2, Line 29: constexpr int X1_OFFSET = 9; Instead of X1_OFFSET it could be MIN_X_OFFSET, similarly for MAX and Y. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@34 PS2, Line 34: MIN_GEOM_SIZE It's not actually the minimal geometry size, rather the header size isn't it? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@35 PS2, Line 35: MIN_POINT_SIZE This is not only the minimal point size but all points have this size, don't they? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@74 PS2, Line 74: OGCTypeToEsriType For other constants we used UPPER_SNAKE_CASE (see L26-36), maybe we should use it for this as well. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@85 PS2, Line 85: static_assert(ESRI_TYPE_SIZE == sizeof(EsriType)); Instead of (or in addition to) this static assert, we could have static_assert(ESRI_TYPE_SIZE == sizeof(uint32_t)) before the EsriType enum definition on L46. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@86 PS2, Line 86: return readFromGeom<EsriType>(geom, ESRI_TYPE_OFFSET); Could add a DCHECK that 'geom.len' is large enough. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@89 PS2, Line 89: inline double getMinX(const StringVal& geom) { For these getters and setters we could add DCHECKs that 'geom.len' is large enough. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@106 PS2, Line 106: static_assert(ESRI_TYPE_SIZE == sizeof(EsriType)); See L85. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@130 PS2, Line 130: if (geom.is_null) return false; Should we set (*ogc_type) to UNKNOWN in this case? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@177 PS2, Line 177: & We usually take out parameters by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@188 PS2, Line 188: srid = 0 As far as I've seen we don't use the default value, so we could make it an obligatory argument to be explicit. Alternatively we could remove specifying 0 when it is only a default value at the call sites. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@201 PS2, Line 201: lhs I don't think 'lhs' is needed here. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@213 PS2, Line 213: StringVal Could be a reference. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/shape-format.h@216 PS2, Line 216: 1 Instead of '1' we could have 'lhs'. Similarly for 'rhs'. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h File be/src/exprs/geo/utils.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@97 PS2, Line 97: DCHECK_GE(ring.size(), 2); What about the edge case when there are two identical points? Does it count as a closed ring? In that case, arguably, a single point could also be considered a closed ring. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@111 PS2, Line 111: double bottomleft_x = ring[0].x(); I'm not sure 'bottomleft' is a good name because the orientation of the axes is not defined, or is it defined? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@148 PS2, Line 148: & We usually take non-const values by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@153 PS2, Line 153: polygon.clear(); It is a bit inconsistent that we don't clear the ring if the rings are empty but clear it in case of other errors. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@156 PS2, Line 156: for (int i = 0; i < rings.size(); i++) { We're not using 'i' other than indexing 'rings', so it could be a range-based for loop. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@161 PS2, Line 161: rings Nit: I think it should be singular, i.e. "more than one outer ring". http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@168 PS2, Line 168: std::swap(polygon.inners().back(), rings[i]); Couldn't this be achieved by polygon.inners().emplace_back(std::move(rings[i])); ? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@175 PS2, Line 175: & We usually take non-const values by pointer. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@180 PS2, Line 180: mpolygon.clear(); See L153. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@183 PS2, Line 183: rings[i] Could extract it to a variable. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@185 PS2, Line 185: conunter clockwise Nit: counter-clockwise with a hyphen? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/utils.h@195 PS2, Line 195: std::swap(mpolygon.back().inners().back(), rings[i]); Couldn't this be achieved by mpolygon.back().inners().emplace_back(std::move(rings[i])); ? http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h File be/src/exprs/geo/wkt.h: http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@29 PS2, Line 29: bool Do we ever return false? Shouldn't we convert a possible exception from boost into 'false'? Or if we want to catch the exception later, the function could return void. Same for the specialisations. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@29 PS2, Line 29: & We usually take non-const values by pointer. Same for the specialisations. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@30 PS2, Line 30: Too much indentation. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@36 PS2, Line 36: Too much indentation. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@44 PS2, Line 44: Too much indentation. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@66 PS2, Line 66: std::string Could be const. http://gerrit.cloudera.org:8080/#/c/20602/2/be/src/exprs/geo/wkt.h@81 PS2, Line 81: spaces Could we also have tabs and other whitespace characters that we should skip? See also L84. http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java File fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java: http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@97 PS2, Line 97: // TODO: this worked well for cases when the Java UDF had only one overload In these cases we now override all overloads, right? We should update this TODO to reflect that. http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@264 PS2, Line 264: // BINARY, BINARY / BINARY, STRING / STRING, BINARY / STRING, STRING Could make it clear that these are the parameter combinations. Also, could put the different combinations on separate lines. http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@285 PS2, Line 285: addNative(db, "st_Point", false, Type.BINARY, Type.STRING); Nit: for the others, the STRING-arg version comes first, it could be the same for "st_Point" as well. http://gerrit.cloudera.org:8080/#/c/20602/2/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java@296 PS2, Line 296: new ArrayList No need to construct an ArrayList, the list returned from Arrays.asList() would be enough. http://gerrit.cloudera.org:8080/#/c/20602/2/tests/custom_cluster/test_geospatial_library.py File tests/custom_cluster/test_geospatial_library.py: http://gerrit.cloudera.org:8080/#/c/20602/2/tests/custom_cluster/test_geospatial_library.py@48 PS2, Line 48: assert ST_POINT_SIGNATURE_BUILTIN in result.data Could also assert that the Java version is not present. http://gerrit.cloudera.org:8080/#/c/20602/2/tests/custom_cluster/test_geospatial_library.py@54 PS2, Line 54: assert ST_POINT_SIGNATURE_JAVA in result.data Could also assert that the C++ version is not present. http://gerrit.cloudera.org:8080/#/c/20602/2/tests/query_test/test_geospatial_functions.py File tests/query_test/test_geospatial_functions.py: http://gerrit.cloudera.org:8080/#/c/20602/2/tests/query_test/test_geospatial_functions.py@32 PS2, Line 32: do not use tables It is no longer true, we use table 'geospatial_higher_dim' in geospatial-esri-native-higher-dimension.test. -- To view, visit http://gerrit.cloudera.org:8080/20602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I692ebc13617c37d61f77fffb09e872095a0fd11c Gerrit-Change-Number: 20602 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Mon, 30 Oct 2023 15:46:35 +0000 Gerrit-HasComments: Yes
