Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )
Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs ...................................................................... Patch Set 16: (4 comments) http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java File fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java: http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java@27 PS16, Line 27: private final long ptr_; I think that the best would be to drop the Impala writables and store an array of longs (pointers) to the native values in the executor + reload it on every invocation. As this change is mainly a fix I am ok with not doing it here to reduce the scope of the change. http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java@34 PS16, Line 34: super.set(new String(bytes)); This is slower than passing the bytes directly as Text stores the string as a utf8 byte array instead of a java String. https://github.com/apache/hadoop/blob/927401886ae5be5f3c8dd6d82f13363bba594396/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java#L230 http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java File fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java: http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java@29 PS16, Line 29: StringValueReader optional: could be merged to https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java ? http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java File java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java: http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@34 PS16, Line 34: CachedWritablesUdf optional: maybe another name would be clearer now as we no longer cache the strings but simply copy them, e.g. BufferAlteringUdf -- To view, visit http://gerrit.cloudera.org:8080/19507 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212 Gerrit-Change-Number: 19507 Gerrit-PatchSet: 16 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Comment-Date: Tue, 07 Mar 2023 07:45:33 +0000 Gerrit-HasComments: Yes
