Re: Null array elements with joins
Hi folks, I submitted PR https://github.com/apache/phoenix/pull/335 Appreciate your review. Thanks, Gerald On Mon, Aug 13, 2018 at 1:10 PM, James Taylor wrote: > I commented on the JIRA you filed here: PHOENIX-4791. Best to keep > discussion there. > Thanks, > James > > On Mon, Aug 13, 2018 at 11:08 AM, Gerald Sangudi > wrote: > >> Hello all, >> >> Any suggestions or pointers on the issue below? >> >> Projecting array elements works when not using joins, and does not work >> when we use hash joins. Is there an issue with the ProjectionCompiler for >> joins? I have not been able to isolate the specific cause, and would >> appreciate any pointers or suggestions. >> >> Thanks, >> Gerald >> >> On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami < >> tulasi.krishn...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm running few tests against Phoenix array and running into this bug >>> where array elements return null values when a join is involved. Is this a >>> known issue/limitation of arrays? >>> >>> create table array_test_1 (id integer not null primary key, arr >>> tinyint[5]); >>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); >>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); >>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); >>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); >>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); >>> >>> create table test_table_1 (id integer not null primary key, val varchar); >>> upsert into test_table_1 values (1001, 'abc'); >>> upsert into test_table_1 values (1002, 'def'); >>> upsert into test_table_1 values (1003, 'ghi'); >>> >>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], >>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = >>> t2.id; >>> ++-++--- >>> -++ >>> | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | >>> ARRAY_ELEM(T1.ARR, 3) | >>> ++-++--- >>> -++ >>> | 1001 | abc | null | null | >>> null | >>> | 1002 | def | null | null | >>> null | >>> | 1003 | ghi | null | null | >>> null | >>> ++-++--- >>> -++ >>> 3 rows selected (0.056 seconds) >>> >>> However, directly selecting array elements from the array returns data >>> correctly. >>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], >>> t1.arr[3] from array_test_1 as t1; >>> +---+-+-+--- >>> --+ >>> | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, >>> 3) | >>> +---+-+-+--- >>> --+ >>> | 1001 | 0 | 0 | 0 >>> | >>> | 1002 | 0 | 0 | 0 >>> | >>> | 1003 | 0 | 0 | 0 >>> | >>> | 1004 | 0 | 0 | 1 >>> | >>> | 1005 | 1 | 1 | 1 >>> | >>> +---+-+-+--- >>> --+ >>> 5 rows selected (0.044 seconds) >>> >>> >>> >> >
Re: Null array elements with joins
I commented on the JIRA you filed here: PHOENIX-4791. Best to keep discussion there. Thanks, James On Mon, Aug 13, 2018 at 11:08 AM, Gerald Sangudi wrote: > Hello all, > > Any suggestions or pointers on the issue below? > > Projecting array elements works when not using joins, and does not work > when we use hash joins. Is there an issue with the ProjectionCompiler for > joins? I have not been able to isolate the specific cause, and would > appreciate any pointers or suggestions. > > Thanks, > Gerald > > On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami < > tulasi.krishn...@gmail.com> wrote: > >> Hi, >> >> I'm running few tests against Phoenix array and running into this bug >> where array elements return null values when a join is involved. Is this a >> known issue/limitation of arrays? >> >> create table array_test_1 (id integer not null primary key, arr >> tinyint[5]); >> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); >> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); >> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); >> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); >> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); >> >> create table test_table_1 (id integer not null primary key, val varchar); >> upsert into test_table_1 values (1001, 'abc'); >> upsert into test_table_1 values (1002, 'def'); >> upsert into test_table_1 values (1003, 'ghi'); >> >> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], >> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = >> t2.id; >> ++-++--- >> -++ >> | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | >> ARRAY_ELEM(T1.ARR, 3) | >> ++-++--- >> -++ >> | 1001 | abc | null | null | >> null | >> | 1002 | def | null | null | >> null | >> | 1003 | ghi | null | null | >> null | >> ++-++--- >> -++ >> 3 rows selected (0.056 seconds) >> >> However, directly selecting array elements from the array returns data >> correctly. >> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3] >> from array_test_1 as t1; >> +---+-+-+--- >> --+ >> | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, 3) >> | >> +---+-+-+--- >> --+ >> | 1001 | 0 | 0 | 0 >> | >> | 1002 | 0 | 0 | 0 >> | >> | 1003 | 0 | 0 | 0 >> | >> | 1004 | 0 | 0 | 1 >> | >> | 1005 | 1 | 1 | 1 >> | >> +---+-+-+--- >> --+ >> 5 rows selected (0.044 seconds) >> >> >> >
Re: Null array elements with joins
Hello all, Any suggestions or pointers on the issue below? Projecting array elements works when not using joins, and does not work when we use hash joins. Is there an issue with the ProjectionCompiler for joins? I have not been able to isolate the specific cause, and would appreciate any pointers or suggestions. Thanks, Gerald On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami < tulasi.krishn...@gmail.com> wrote: > Hi, > > I'm running few tests against Phoenix array and running into this bug > where array elements return null values when a join is involved. Is this a > known issue/limitation of arrays? > > create table array_test_1 (id integer not null primary key, arr > tinyint[5]); > upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); > upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); > upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); > upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); > upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); > > create table test_table_1 (id integer not null primary key, val varchar); > upsert into test_table_1 values (1001, 'abc'); > upsert into test_table_1 values (1002, 'def'); > upsert into test_table_1 values (1003, 'ghi'); > > 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], > t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id > ; > ++-++--- > -++ > | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | > ARRAY_ELEM(T1.ARR, 3) | > ++-++--- > -++ > | 1001 | abc | null | null | > null | > | 1002 | def | null | null | > null | > | 1003 | ghi | null | null | > null | > ++-++--- > -++ > 3 rows selected (0.056 seconds) > > However, directly selecting array elements from the array returns data > correctly. > 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3] > from array_test_1 as t1; > +---+-+-+--- > --+ > | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, 3) | > +---+-+-+--- > --+ > | 1001 | 0 | 0 | 0 | > | 1002 | 0 | 0 | 0 | > | 1003 | 0 | 0 | 0 | > | 1004 | 0 | 0 | 1 | > | 1005 | 1 | 1 | 1 | > +---+-+-+--- > --+ > 5 rows selected (0.044 seconds) > > >
Re: Null array elements with joins
I spent some more time debugging this error. In the case of arrays, it looks like there is a mismatch in column family and qualifier between KeyValueColumnExpression and ResultTuple. As a result, tuple.get(...) returns NULLs for array elements. Here are the methods I reviewed: HashJoinScanner#processResults KeyValueSchema#toBytes KeyValueColumnExpression#evaluate tuple.get(cf, cq, ptr) For instance, when selecting following two columns in a query that involves hash-join, - CHAR_COLUMN of CHAR(1) type - ARR_COLUMN[1], ARR_COLUMN[2], ARR_COLUMN[3], ARR_COLUMN[4] of CHAR(1)[5] type ResultTuple produces these cells: cell 1: [123/0:CHAR_COLUMN/1530111764946/Put/vlen=5/seqid=205, key: 123, family: 0, qualifier: CHAR_COLUMN, value: defgh] cell 2: [123/0:_0/1530111764946/Put/vlen=1/seqid=205, key: 123, family: 0, qualifier: _0, value: x] cell 3: [123/_v:\x00\x00\x00\x02/LATEST_TIMESTAMP/Put/vlen=6/seqid=0, key: 123, family: _v, qualifier: ^@^@^@^B, value: pqst^@^O] tuple.get(cf, cq, ptr) works fine for the first cell [tuple.get("0", "CHAR_COLUMN", ptr)], however, in the case of 3rd cell (that is associated with array elements), tuple.get("0", "ARR_COLUMN", ptr) fails because family and qualifier information in KeyValueColumnExpression don't match with the values present in the ResultTuple. KeyValueColumnExpression: cf: 0 cq: ARR_COLUMN ResultTuple cf: _v cq: This appears to be an issue with how "cf, cq" attributes are initialized in KeyValyColumnExpression for ARRAY elements during tuple projection. That is, when projecting array elements, it should perhaps be initialized with "_v, ", instead of "default cf, actual column qualifier"? It'll be helpful to hear from the Phoenix experts on this. On Wed, Jun 20, 2018 at 7:43 AM Tulasi Paradarami < tulasi.krishn...@gmail.com> wrote: > Tested on 4.7, 4.11 & 4.14. > https://issues.apache.org/jira/browse/PHOENIX-4791 > > > On Tue, Jun 19, 2018 at 8:10 PM Jaanai Zhang > wrote: > >> what's your Phoenix's version? >> >> >> >>Yun Zhang >>Best regards! >> >> >> 2018-06-20 1:02 GMT+08:00 Tulasi Paradarami : >> >>> Hi, >>> >>> I'm running few tests against Phoenix array and running into this bug >>> where array elements return null values when a join is involved. Is this a >>> known issue/limitation of arrays? >>> >>> create table array_test_1 (id integer not null primary key, arr >>> tinyint[5]); >>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); >>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); >>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); >>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); >>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); >>> >>> create table test_table_1 (id integer not null primary key, val varchar); >>> upsert into test_table_1 values (1001, 'abc'); >>> upsert into test_table_1 values (1002, 'def'); >>> upsert into test_table_1 values (1003, 'ghi'); >>> >>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], >>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = >>> t2.id; >>> >>> ++-++++ >>> | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | >>> ARRAY_ELEM(T1.ARR, 3) | >>> >>> ++-++++ >>> | 1001 | abc | null | null | >>> null | >>> | 1002 | def | null | null | >>> null | >>> | 1003 | ghi | null | null | >>> null | >>> >>> ++-++++ >>> 3 rows selected (0.056 seconds) >>> >>> However, directly selecting array elements from the array returns data >>> correctly. >>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], >>> t1.arr[3] from array_test_1 as t1; >>> >>> +---+-+-+-+ >>> | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, >>> 3) | >>> >>> +---+-+-+-+ >>> | 1001 | 0 | 0 | 0 >>> | >>> | 1002 | 0 | 0 | 0 >>> | >>> | 1003 | 0 | 0 | 0 >>> | >>> | 1004 | 0 | 0 | 1 >>> | >>> | 1005 | 1 | 1 | 1 >>> | >>> >>> +---+-+-+-+ >>> 5 rows selected (0.044 seconds) >>> >>> >>> >>
Re: Null array elements with joins
Tested on 4.7, 4.11 & 4.14. https://issues.apache.org/jira/browse/PHOENIX-4791 On Tue, Jun 19, 2018 at 8:10 PM Jaanai Zhang wrote: > what's your Phoenix's version? > > > >Yun Zhang >Best regards! > > > 2018-06-20 1:02 GMT+08:00 Tulasi Paradarami : > >> Hi, >> >> I'm running few tests against Phoenix array and running into this bug >> where array elements return null values when a join is involved. Is this a >> known issue/limitation of arrays? >> >> create table array_test_1 (id integer not null primary key, arr >> tinyint[5]); >> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); >> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); >> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); >> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); >> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); >> >> create table test_table_1 (id integer not null primary key, val varchar); >> upsert into test_table_1 values (1001, 'abc'); >> upsert into test_table_1 values (1002, 'def'); >> upsert into test_table_1 values (1003, 'ghi'); >> >> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], >> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = >> t2.id; >> >> ++-++++ >> | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | >> ARRAY_ELEM(T1.ARR, 3) | >> >> ++-++++ >> | 1001 | abc | null | null | >> null | >> | 1002 | def | null | null | >> null | >> | 1003 | ghi | null | null | >> null | >> >> ++-++++ >> 3 rows selected (0.056 seconds) >> >> However, directly selecting array elements from the array returns data >> correctly. >> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3] >> from array_test_1 as t1; >> >> +---+-+-+-+ >> | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, 3) >> | >> >> +---+-+-+-+ >> | 1001 | 0 | 0 | 0 >> | >> | 1002 | 0 | 0 | 0 >> | >> | 1003 | 0 | 0 | 0 >> | >> | 1004 | 0 | 0 | 1 >> | >> | 1005 | 1 | 1 | 1 >> | >> >> +---+-+-+-+ >> 5 rows selected (0.044 seconds) >> >> >> >
Re: Null array elements with joins
what's your Phoenix's version? Yun Zhang Best regards! 2018-06-20 1:02 GMT+08:00 Tulasi Paradarami : > Hi, > > I'm running few tests against Phoenix array and running into this bug > where array elements return null values when a join is involved. Is this a > known issue/limitation of arrays? > > create table array_test_1 (id integer not null primary key, arr > tinyint[5]); > upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); > upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); > upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); > upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); > upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); > > create table test_table_1 (id integer not null primary key, val varchar); > upsert into test_table_1 values (1001, 'abc'); > upsert into test_table_1 values (1002, 'def'); > upsert into test_table_1 values (1003, 'ghi'); > > 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], > t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id > ; > ++-++--- > -++ > | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | > ARRAY_ELEM(T1.ARR, 3) | > ++-++--- > -++ > | 1001 | abc | null | null | > null | > | 1002 | def | null | null | > null | > | 1003 | ghi | null | null | > null | > ++-++--- > -++ > 3 rows selected (0.056 seconds) > > However, directly selecting array elements from the array returns data > correctly. > 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3] > from array_test_1 as t1; > +---+-+-+--- > --+ > | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, 3) | > +---+-+-+--- > --+ > | 1001 | 0 | 0 | 0 | > | 1002 | 0 | 0 | 0 | > | 1003 | 0 | 0 | 0 | > | 1004 | 0 | 0 | 1 | > | 1005 | 1 | 1 | 1 | > +---+-+-+--- > --+ > 5 rows selected (0.044 seconds) > > >
Null array elements with joins
Hi, I'm running few tests against Phoenix array and running into this bug where array elements return null values when a join is involved. Is this a known issue/limitation of arrays? create table array_test_1 (id integer not null primary key, arr tinyint[5]); upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]); create table test_table_1 (id integer not null primary key, val varchar); upsert into test_table_1 values (1001, 'abc'); upsert into test_table_1 values (1002, 'def'); upsert into test_table_1 values (1003, 'ghi'); 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id; ++-++++ | T1.ID | T2.VAL | ARRAY_ELEM(T1.ARR, 1) | ARRAY_ELEM(T1.ARR, 2) | ARRAY_ELEM(T1.ARR, 3) | ++-++++ | 1001 | abc | null | null | null | | 1002 | def | null | null | null | | 1003 | ghi | null | null | null | ++-++++ 3 rows selected (0.056 seconds) However, directly selecting array elements from the array returns data correctly. 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3] from array_test_1 as t1; +---+-+-+-+ | ID | ARRAY_ELEM(ARR, 1) | ARRAY_ELEM(ARR, 2) | ARRAY_ELEM(ARR, 3) | +---+-+-+-+ | 1001 | 0 | 0 | 0 | | 1002 | 0 | 0 | 0 | | 1003 | 0 | 0 | 0 | | 1004 | 0 | 0 | 1 | | 1005 | 1 | 1 | 1 | +---+-+-+-+ 5 rows selected (0.044 seconds)