Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be 
changed in UDFs
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java@27
PS16, Line 27:   private final long ptr_;
I think that the best would be to drop the Impala writables and store an array 
of longs (pointers) to the native values in the executor + reload it on every 
invocation.

As this change is mainly a fix I am ok with not doing it here to reduce the 
scope of the change.


http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java@34
PS16, Line 34:     super.set(new String(bytes));
This is slower than passing the bytes directly as Text stores the string as a 
utf8 byte array instead of a java String.

https://github.com/apache/hadoop/blob/927401886ae5be5f3c8dd6d82f13363bba594396/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java#L230


http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java
File fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java@29
PS16, Line 29: StringValueReader
optional: could be merged to 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java
 ?


http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
File 
java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@34
PS16, Line 34: CachedWritablesUdf
optional: maybe another name would be clearer now as we no longer cache the 
strings but simply copy them, e.g. BufferAlteringUdf



--
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Comment-Date: Tue, 07 Mar 2023 07:45:33 +0000
Gerrit-HasComments: Yes

Reply via email to