Re: Null array elements with joins

2018-08-27 Thread Gerald Sangudi
Hi folks,

I submitted PR https://github.com/apache/phoenix/pull/335

Appreciate your review.

Thanks,
Gerald

On Mon, Aug 13, 2018 at 1:10 PM, James Taylor 
wrote:

> I commented on the JIRA you filed here: PHOENIX-4791. Best to keep
> discussion there.
> Thanks,
> James
>
> On Mon, Aug 13, 2018 at 11:08 AM, Gerald Sangudi 
> wrote:
>
>> Hello all,
>>
>> Any suggestions or pointers on the issue below?
>>
>> Projecting array elements works when not using joins, and does not work
>> when we use hash joins. Is there an issue with the ProjectionCompiler for
>> joins? I have not been able to isolate the specific cause, and would
>> appreciate any pointers or suggestions.
>>
>> Thanks,
>> Gerald
>>
>> On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami <
>> tulasi.krishn...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm running few tests against Phoenix array and running into this bug
>>> where array elements return null values when a join is involved. Is this a
>>> known issue/limitation of arrays?
>>>
>>> create table array_test_1 (id integer not null primary key, arr
>>> tinyint[5]);
>>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
>>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
>>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
>>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
>>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>>>
>>> create table test_table_1 (id integer not null primary key, val varchar);
>>> upsert into test_table_1 values (1001, 'abc');
>>> upsert into test_table_1 values (1002, 'def');
>>> upsert into test_table_1 values (1003, 'ghi');
>>>
>>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id =
>>> t2.id;
>>> ++-++---
>>> -++
>>> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
>>> ARRAY_ELEM(T1.ARR, 3)  |
>>> ++-++---
>>> -++
>>> | 1001   | abc | null   | null   |
>>> null   |
>>> | 1002   | def | null   | null   |
>>> null   |
>>> | 1003   | ghi | null   | null   |
>>> null   |
>>> ++-++---
>>> -++
>>> 3 rows selected (0.056 seconds)
>>>
>>> However, directly selecting array elements from the array returns data
>>> correctly.
>>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1;
>>> +---+-+-+---
>>> --+
>>> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR,
>>> 3)  |
>>> +---+-+-+---
>>> --+
>>> | 1001  | 0   | 0   | 0
>>>  |
>>> | 1002  | 0   | 0   | 0
>>>  |
>>> | 1003  | 0   | 0   | 0
>>>  |
>>> | 1004  | 0   | 0   | 1
>>>  |
>>> | 1005  | 1   | 1   | 1
>>>  |
>>> +---+-+-+---
>>> --+
>>> 5 rows selected (0.044 seconds)
>>>
>>>
>>>
>>
>


Re: Null array elements with joins

2018-08-13 Thread James Taylor
I commented on the JIRA you filed here: PHOENIX-4791. Best to keep
discussion there.
Thanks,
James

On Mon, Aug 13, 2018 at 11:08 AM, Gerald Sangudi 
wrote:

> Hello all,
>
> Any suggestions or pointers on the issue below?
>
> Projecting array elements works when not using joins, and does not work
> when we use hash joins. Is there an issue with the ProjectionCompiler for
> joins? I have not been able to isolate the specific cause, and would
> appreciate any pointers or suggestions.
>
> Thanks,
> Gerald
>
> On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami <
> tulasi.krishn...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm running few tests against Phoenix array and running into this bug
>> where array elements return null values when a join is involved. Is this a
>> known issue/limitation of arrays?
>>
>> create table array_test_1 (id integer not null primary key, arr
>> tinyint[5]);
>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>>
>> create table test_table_1 (id integer not null primary key, val varchar);
>> upsert into test_table_1 values (1001, 'abc');
>> upsert into test_table_1 values (1002, 'def');
>> upsert into test_table_1 values (1003, 'ghi');
>>
>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id =
>> t2.id;
>> ++-++---
>> -++
>> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
>> ARRAY_ELEM(T1.ARR, 3)  |
>> ++-++---
>> -++
>> | 1001   | abc | null   | null   |
>> null   |
>> | 1002   | def | null   | null   |
>> null   |
>> | 1003   | ghi | null   | null   |
>> null   |
>> ++-++---
>> -++
>> 3 rows selected (0.056 seconds)
>>
>> However, directly selecting array elements from the array returns data
>> correctly.
>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3]
>> from array_test_1 as t1;
>> +---+-+-+---
>> --+
>> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)
>> |
>> +---+-+-+---
>> --+
>> | 1001  | 0   | 0   | 0
>>  |
>> | 1002  | 0   | 0   | 0
>>  |
>> | 1003  | 0   | 0   | 0
>>  |
>> | 1004  | 0   | 0   | 1
>>  |
>> | 1005  | 1   | 1   | 1
>>  |
>> +---+-+-+---
>> --+
>> 5 rows selected (0.044 seconds)
>>
>>
>>
>


Re: Null array elements with joins

2018-08-13 Thread Gerald Sangudi
Hello all,

Any suggestions or pointers on the issue below?

Projecting array elements works when not using joins, and does not work
when we use hash joins. Is there an issue with the ProjectionCompiler for
joins? I have not been able to isolate the specific cause, and would
appreciate any pointers or suggestions.

Thanks,
Gerald

On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami <
tulasi.krishn...@gmail.com> wrote:

> Hi,
>
> I'm running few tests against Phoenix array and running into this bug
> where array elements return null values when a join is involved. Is this a
> known issue/limitation of arrays?
>
> create table array_test_1 (id integer not null primary key, arr
> tinyint[5]);
> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>
> create table test_table_1 (id integer not null primary key, val varchar);
> upsert into test_table_1 values (1001, 'abc');
> upsert into test_table_1 values (1002, 'def');
> upsert into test_table_1 values (1003, 'ghi');
>
> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id
> ;
> ++-++---
> -++
> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
> ARRAY_ELEM(T1.ARR, 3)  |
> ++-++---
> -++
> | 1001   | abc | null   | null   |
> null   |
> | 1002   | def | null   | null   |
> null   |
> | 1003   | ghi | null   | null   |
> null   |
> ++-++---
> -++
> 3 rows selected (0.056 seconds)
>
> However, directly selecting array elements from the array returns data
> correctly.
> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3]
> from array_test_1 as t1;
> +---+-+-+---
> --+
> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
> +---+-+-+---
> --+
> | 1001  | 0   | 0   | 0   |
> | 1002  | 0   | 0   | 0   |
> | 1003  | 0   | 0   | 0   |
> | 1004  | 0   | 0   | 1   |
> | 1005  | 1   | 1   | 1   |
> +---+-+-+---
> --+
> 5 rows selected (0.044 seconds)
>
>
>


Re: Null array elements with joins

2018-06-27 Thread Tulasi Paradarami
I spent some more time debugging this error. In the case of arrays, it
looks like there is a mismatch in column family and qualifier between
KeyValueColumnExpression and ResultTuple. As a result, tuple.get(...)
returns NULLs for array elements.

Here are the methods I reviewed:

HashJoinScanner#processResults
KeyValueSchema#toBytes
KeyValueColumnExpression#evaluate
tuple.get(cf, cq, ptr)

For instance, when selecting following two columns in a query that involves
hash-join,
  - CHAR_COLUMN of CHAR(1) type
  - ARR_COLUMN[1], ARR_COLUMN[2], ARR_COLUMN[3], ARR_COLUMN[4] of
CHAR(1)[5] type

ResultTuple produces these cells:
cell 1: [123/0:CHAR_COLUMN/1530111764946/Put/vlen=5/seqid=205, key: 123,
family: 0, qualifier: CHAR_COLUMN, value: defgh]
cell 2: [123/0:_0/1530111764946/Put/vlen=1/seqid=205, key: 123, family: 0,
qualifier: _0, value: x]
cell 3: [123/_v:\x00\x00\x00\x02/LATEST_TIMESTAMP/Put/vlen=6/seqid=0, key:
123, family: _v, qualifier: ^@^@^@^B, value: pqst^@^O]

tuple.get(cf, cq, ptr) works fine for the first cell [tuple.get("0",
"CHAR_COLUMN", ptr)], however, in the case of 3rd cell (that is associated
with array elements), tuple.get("0", "ARR_COLUMN", ptr) fails because
family and qualifier information in KeyValueColumnExpression don't match
with the values present in the ResultTuple.

KeyValueColumnExpression:
cf: 0
cq: ARR_COLUMN

ResultTuple
cf: _v
cq: 

This appears to be an issue with how "cf, cq" attributes are initialized in
KeyValyColumnExpression for ARRAY elements during tuple projection. That
is, when projecting array elements, it should perhaps be initialized with
"_v, ", instead of "default cf, actual column qualifier"?

It'll be helpful to hear from the Phoenix experts on this.


On Wed, Jun 20, 2018 at 7:43 AM Tulasi Paradarami <
tulasi.krishn...@gmail.com> wrote:

> Tested on 4.7, 4.11 & 4.14.
> https://issues.apache.org/jira/browse/PHOENIX-4791
>
>
> On Tue, Jun 19, 2018 at 8:10 PM Jaanai Zhang 
> wrote:
>
>> what's your Phoenix's version?
>>
>>
>> 
>>Yun Zhang
>>Best regards!
>>
>>
>> 2018-06-20 1:02 GMT+08:00 Tulasi Paradarami :
>>
>>> Hi,
>>>
>>> I'm running few tests against Phoenix array and running into this bug
>>> where array elements return null values when a join is involved. Is this a
>>> known issue/limitation of arrays?
>>>
>>> create table array_test_1 (id integer not null primary key, arr
>>> tinyint[5]);
>>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
>>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
>>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
>>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
>>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>>>
>>> create table test_table_1 (id integer not null primary key, val varchar);
>>> upsert into test_table_1 values (1001, 'abc');
>>> upsert into test_table_1 values (1002, 'def');
>>> upsert into test_table_1 values (1003, 'ghi');
>>>
>>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id =
>>> t2.id;
>>>
>>> ++-++++
>>> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
>>> ARRAY_ELEM(T1.ARR, 3)  |
>>>
>>> ++-++++
>>> | 1001   | abc | null   | null   |
>>> null   |
>>> | 1002   | def | null   | null   |
>>> null   |
>>> | 1003   | ghi | null   | null   |
>>> null   |
>>>
>>> ++-++++
>>> 3 rows selected (0.056 seconds)
>>>
>>> However, directly selecting array elements from the array returns data
>>> correctly.
>>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1;
>>>
>>> +---+-+-+-+
>>> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR,
>>> 3)  |
>>>
>>> +---+-+-+-+
>>> | 1001  | 0   | 0   | 0
>>>  |
>>> | 1002  | 0   | 0   | 0
>>>  |
>>> | 1003  | 0   | 0   | 0
>>>  |
>>> | 1004  | 0   | 0   | 1
>>>  |
>>> | 1005  | 1   | 1   | 1
>>>  |
>>>
>>> +---+-+-+-+
>>> 5 rows selected (0.044 seconds)
>>>
>>>
>>>
>>


Re: Null array elements with joins

2018-06-20 Thread Tulasi Paradarami
Tested on 4.7, 4.11 & 4.14.
https://issues.apache.org/jira/browse/PHOENIX-4791


On Tue, Jun 19, 2018 at 8:10 PM Jaanai Zhang  wrote:

> what's your Phoenix's version?
>
>
> 
>Yun Zhang
>Best regards!
>
>
> 2018-06-20 1:02 GMT+08:00 Tulasi Paradarami :
>
>> Hi,
>>
>> I'm running few tests against Phoenix array and running into this bug
>> where array elements return null values when a join is involved. Is this a
>> known issue/limitation of arrays?
>>
>> create table array_test_1 (id integer not null primary key, arr
>> tinyint[5]);
>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>>
>> create table test_table_1 (id integer not null primary key, val varchar);
>> upsert into test_table_1 values (1001, 'abc');
>> upsert into test_table_1 values (1002, 'def');
>> upsert into test_table_1 values (1003, 'ghi');
>>
>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id =
>> t2.id;
>>
>> ++-++++
>> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
>> ARRAY_ELEM(T1.ARR, 3)  |
>>
>> ++-++++
>> | 1001   | abc | null   | null   |
>> null   |
>> | 1002   | def | null   | null   |
>> null   |
>> | 1003   | ghi | null   | null   |
>> null   |
>>
>> ++-++++
>> 3 rows selected (0.056 seconds)
>>
>> However, directly selecting array elements from the array returns data
>> correctly.
>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3]
>> from array_test_1 as t1;
>>
>> +---+-+-+-+
>> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)
>> |
>>
>> +---+-+-+-+
>> | 1001  | 0   | 0   | 0
>>  |
>> | 1002  | 0   | 0   | 0
>>  |
>> | 1003  | 0   | 0   | 0
>>  |
>> | 1004  | 0   | 0   | 1
>>  |
>> | 1005  | 1   | 1   | 1
>>  |
>>
>> +---+-+-+-+
>> 5 rows selected (0.044 seconds)
>>
>>
>>
>


Re: Null array elements with joins

2018-06-19 Thread Jaanai Zhang
what's your Phoenix's version?



   Yun Zhang
   Best regards!


2018-06-20 1:02 GMT+08:00 Tulasi Paradarami :

> Hi,
>
> I'm running few tests against Phoenix array and running into this bug
> where array elements return null values when a join is involved. Is this a
> known issue/limitation of arrays?
>
> create table array_test_1 (id integer not null primary key, arr
> tinyint[5]);
> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>
> create table test_table_1 (id integer not null primary key, val varchar);
> upsert into test_table_1 values (1001, 'abc');
> upsert into test_table_1 values (1002, 'def');
> upsert into test_table_1 values (1003, 'ghi');
>
> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id
> ;
> ++-++---
> -++
> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
> ARRAY_ELEM(T1.ARR, 3)  |
> ++-++---
> -++
> | 1001   | abc | null   | null   |
> null   |
> | 1002   | def | null   | null   |
> null   |
> | 1003   | ghi | null   | null   |
> null   |
> ++-++---
> -++
> 3 rows selected (0.056 seconds)
>
> However, directly selecting array elements from the array returns data
> correctly.
> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3]
> from array_test_1 as t1;
> +---+-+-+---
> --+
> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
> +---+-+-+---
> --+
> | 1001  | 0   | 0   | 0   |
> | 1002  | 0   | 0   | 0   |
> | 1003  | 0   | 0   | 0   |
> | 1004  | 0   | 0   | 1   |
> | 1005  | 1   | 1   | 1   |
> +---+-+-+---
> --+
> 5 rows selected (0.044 seconds)
>
>
>


Null array elements with joins

2018-06-19 Thread Tulasi Paradarami
Hi,

I'm running few tests against Phoenix array and running into this bug where
array elements return null values when a join is involved. Is this a known
issue/limitation of arrays?

create table array_test_1 (id integer not null primary key, arr tinyint[5]);
upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);

create table test_table_1 (id integer not null primary key, val varchar);
upsert into test_table_1 values (1001, 'abc');
upsert into test_table_1 values (1002, 'def');
upsert into test_table_1 values (1003, 'ghi');

0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id;
++-++++
| T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
ARRAY_ELEM(T1.ARR, 3)  |
++-++++
| 1001   | abc | null   | null   |
null   |
| 1002   | def | null   | null   |
null   |
| 1003   | ghi | null   | null   |
null   |
++-++++
3 rows selected (0.056 seconds)

However, directly selecting array elements from the array returns data
correctly.
0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3]
from array_test_1 as t1;
+---+-+-+-+
|  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
+---+-+-+-+
| 1001  | 0   | 0   | 0   |
| 1002  | 0   | 0   | 0   |
| 1003  | 0   | 0   | 0   |
| 1004  | 0   | 0   | 1   |
| 1005  | 1   | 1   | 1   |
+---+-+-+-+
5 rows selected (0.044 seconds)