[ 
https://issues.apache.org/jira/browse/IMPALA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814513#comment-17814513
 ] 

ASF subversion and git services commented on IMPALA-12783:
----------------------------------------------------------

Commit 27955a385e8d442e183cbf22cfc068124f830986 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=27955a385 ]

IMPALA-12783: Nested struct with varlen data crashes

If a struct ("main") is within an array and contains two child structs
("s1" ans "s2") which both contain strings (or other varlen data),
Impala crashes when this struct is re-materialised (for example in a
sort with limit) if codegen is enabled.

To reproduce:

In Hive:
 create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2:
   STRUCT<str2: STRING>>>) stored as parquet;
 insert into nested values (array( named_struct("s1",
   named_struct("str1", "A string that is long"), "s2",
   named_struct("str2", "Another string that is long") )));

In Impala:
 select 1, arr from nested order by 1 limit 1;

This is because in the codegen'd code, when checking if the strings
("str1" and "str2" in the example) are NULL, we incorrectly calculate
the offset of their null indicator bytes from the memory address of
their containing struct, not from the beginning of the "master tuple",
which in this case is the item tuple of the array.

Note that the null indicators of struct members are always at the end of
the tuple containing the struct (recursively), i.e. the master tuple.

This change corrects the behaviour, passing the master tuple to
functions that need it.

Testing:
 - extended the column 'arr_contains_nested_struct' in table
   'collection_struct_mix' to include two nested structs with string
   members. Updated existing queries, which now cover the problem.

Change-Id: Ide2b63f8b18633f38fbe939a17db923606ccb101
Reviewed-on: http://gerrit.cloudera.org:8080/20997
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Nested struct with varlen data crashes
> --------------------------------------
>
>                 Key: IMPALA-12783
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12783
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Major
>
> If a struct ("main") is within an array and contains two child structs ("s1" 
> ans "s2") which both contain strings (or other varlen data), it crashes when 
> re-materialised (for example in a sort with limit) if codegen is enabled.
> To reproduce:
> In Hive:
> {code:java}
> create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2: 
> STRUCT<str2: STRING>>>) stored as parquet;
> insert into nested values (array( named_struct("s1", named_struct("str1", "A 
> string that is long"), "s2", named_struct("str2", "Another string that is 
> long") )));{code}
> In Impala:
> {code:java}
> select 1, arr from nested order by 1 limit 1;{code}
> This seems to be because in the codegen'd code, when checking if the strings 
> ("str1" and "str2" in the example) are NULL, we incorrectly calculate the 
> offset of the null indicator byte from the memory adress of their containing 
> struct, not from the beginning of the "master tuple", which in this case is 
> the item tuple of the array.
> Note that the null indicators of the struct members are at the end of the 
> tuple containing the struct (recursively), i.e. the master tuple.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to