Re: Drill - Flatten function - help please

Andries Engelbrecht Tue, 03 Feb 2015 15:28:51 -0800

I would think an outer join will be more expensive than a predicate on null.


Having flatten produce the extra rows with null is much simpler and easier 
problem to deal to filter them out, than having to do the workaround to produce 
the missing rows if desired with an outer join.

Something form an end user perspective to consider.

—Andries


 


On Feb 3, 2015, at 3:21 PM, Jason Altekruse <[email protected]> wrote:

> I tried looking in the BigQuery docs for a reference point on this. I
> cannot find anywhere that specifies the behavior with empty lists, nor did
> a few google searches and stackoverflow pages shed any light on it. Its
> seems like we might need to just load data into bigquery and test it out
> ourselves to find the answer.
> 
> I do suspect that they act the same as we do currently for the reasons I
> mention on the JIRA. Please take some time to review my comments there and
> describe the types of queries being used, they may justify a change to
> flatten. I think there is value to the current implementations behavior,
> but we should review it in the context of all use cases.
> 
> -Jason
> 
> On Tue, Feb 3, 2015 at 2:53 PM, Sudhakar Thota <[email protected]> wrote:
> 
>> Jason,
>> 
>> Thank you.
>> 
>> I was testing this on customer data and found this limitation.
>> As I cannot put out the customer data, I have used some test data to
>> verify it myself.
>> 
>> Please let me know if there is any work around to get nulls or empty
>> spaces wherever the array is empty.
>> 
>> Steven,
>> 
>> I remember using the similar one on mongo, I don’t remember that
>> eliminating the records. But I can check. I haven’t used Big Query for
>> this, I used it for flat tables, but I can check.
>> 
>> Thanks
>> Sudhakar Thota
>> 
>> 
>> 
>> On Feb 3, 2015, at 2:45 PM, Jason Altekruse <[email protected]>
>> wrote:
>> 
>>> Sudhakar opened an issue for this, so I responded there. Steven is right,
>>> this is the current expected functionality, but I discuss there the
>> reasons
>>> for it and opened the discussion for use cases that need this
>> functionality.
>>> 
>>> On Tue, Feb 3, 2015 at 2:28 PM, Steven Phillips <[email protected]>
>>> wrote:
>>> 
>>>> I think flatten is functioning as designed, but perhaps we need to treat
>>>> this case differently. It might make sense to return a single row with a
>>>> null for the flattened column in this case. The only model for "flatten"
>>>> that I know of is google Big Query. We should see how they handle this
>>>> case.
>>>> 
>>>> On Tue, Feb 3, 2015 at 12:27 PM, MapR <[email protected]> wrote:
>>>> 
>>>>> In a table, column with null values we
>>>>> present as nulls or empty values with other columns. I never had an
>>>>> experience with c1, c2, c3 (columns ) not presented when c3 has null
>>>> values
>>>>> even though c1, c2 have notnull values.
>>>>> 
>>>>> Sudhakar Thota
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Feb 3, 2015, at 11:57 AM, Andries Engelbrecht <
>>>>> [email protected]> wrote:
>>>>>> 
>>>>>> In listing 3 it looks like flatten is doing the correct thing by
>>>>> representing each element in the array with a new row and the element
>> in
>>>>> the column output.
>>>>>> And also for arrays that are empty there is no output row.
>>>>>> 
>>>>>> —Andries
>>>>>> 
>>>>>>> On Feb 3, 2015, at 11:54 AM, Sudhakar Thota <[email protected]>
>>>>> wrote:
>>>>>>> 
>>>>>>> It will be more appropriate if I add my file to have data to
>>>> facilitate
>>>>> quick test.
>>>>>>> 
>>>>>>> [root@maprdemo clicks]# cat sthota_test_1.json
>>>>>>> {"a":"r1cl1", "b":"r1c2","c":{ "x" : 1, "y" : "a string", "z" : [ 1]
>>>> }}
>>>>>>> {"a":"r2cl1", "b":"r2c2","c":{ "x" : 2, "y" : "a string", "z" : [
>> 1,2]
>>>>> }}
>>>>>>> {"a":"r3cl1", "b":"r3c2","c":{ "x" : 3, "y" : "a string", "z" : [
>>>>> 1,2,3] }}
>>>>>>> {"a":"r4cl1", "b":"r4c2","c":{ "x" : 4, "y" : "a string", "z" : [
>>>>> 1,2,3,4] }}
>>>>>>> {"a":"r5cl1", "b":"r5c2","c":{ "x" : 5, "y" : "a string", "z" : [
>>>>> 1,2,3,4,5] }}
>>>>>>> {"a":"r6cl1", "b":"r6c2","c":{ "x" : 6, "y" : "a string"  }}
>>>>>>> 
>>>>>>> 
>>>>>>> Begin forwarded message:
>>>>>>> 
>>>>>>>> From: Sudhakar Thota <[email protected]>
>>>>>>>> Subject: Drill - Flatten function - help please
>>>>>>>> Date: February 3, 2015 at 11:52:39 AM PST
>>>>>>>> To: [email protected]
>>>>>>>> 
>>>>>>>> Hi Drillers,
>>>>>>>> 
>>>>>>>> I have a request here about flatten function. I want to know if this
>>>>> is an expected behavior of drill flatten function with json documents.
>>>>>>>> 
>>>>>>>> Listing-1 is contents of my json collection where I have
>>>> intentionally
>>>>> included a empty array.
>>>>>>>> Listing-2 shows that if I don’t call flatten function it shows all
>>>>> records.
>>>>>>>> Listing-3 shows that if I call flatten function, row6 (r6cl1…) is
>>>>> ignored. I am using that in my projections not in predicate, but it
>> looks
>>>>> like treated as predicate.
>>>>>>>> 
>>>>>>>> Please correct me if I am wrong and let me know if there is any work
>>>>> around.
>>>>>>>> 
>>>>>>>> Listing:1
>>>>>>>> 
>>>>>>>> 0: jdbc:drill:> SELECT  t.a,t.b,t.c.x, t.c.y, t.c.z from
>>>>> dfs.`/data/nested/clicks/sthota_test_1.json` as t;
>>>>>>>> +------------+------------+------------+------------+------------+
>>>>>>>> |     a      |     b      |   EXPR$2   |   EXPR$3   |   EXPR$4   |
>>>>>>>> +------------+------------+------------+------------+------------+
>>>>>>>> | r1cl1      | r1c2       | 1          | a string   | ["1"]      |
>>>>>>>> | r2cl1      | r2c2       | 2          | a string   | ["1","2"]  |
>>>>>>>> | r3cl1      | r3c2       | 3          | a string   | ["1","2","3"]
>> |
>>>>>>>> | r4cl1      | r4c2       | 4          | a string   |
>>>>> ["1","2","3","4"] |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   |
>>>>> ["1","2","3","4","5"] |
>>>>>>>> | r6cl1      | r6c2       | 6          | a string   | []         |
>>>>>>>> +------------+------------+------------+------------+------------+
>>>>>>>> 6 rows selected (0.125 seconds)
>>>>>>>> 0: jdbc:drill:>
>>>>>>>> 
>>>>>>>> Listing:2
>>>>>>>> 
>>>>>>>> 0: jdbc:drill:> SELECT  t.a,t.b,t.c.x, t.c.y from
>>>>> dfs.`/data/nested/clicks/sthota_test_1.json` as t;
>>>>>>>> +------------+------------+------------+------------+
>>>>>>>> |     a      |     b      |   EXPR$2   |   EXPR$3   |
>>>>>>>> +------------+------------+------------+------------+
>>>>>>>> | r1cl1      | r1c2       | 1          | a string   |
>>>>>>>> | r2cl1      | r2c2       | 2          | a string   |
>>>>>>>> | r3cl1      | r3c2       | 3          | a string   |
>>>>>>>> | r4cl1      | r4c2       | 4          | a string   |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   |
>>>>>>>> | r6cl1      | r6c2       | 6          | a string   |
>>>>>>>> +------------+------------+------------+------------+
>>>>>>>> 6 rows selected (0.163 seconds)
>>>>>>>> 
>>>>>>>> Listing:3
>>>>>>>> 
>>>>>>>> 0: jdbc:drill:> SELECT  t.a,t.b,t.c.x, t.c.y, flatten(t.c.z) from
>>>>> dfs.`/data/nested/clicks/sthota_test_1.json` as t;
>>>>>>>> +------------+------------+------------+------------+------------+
>>>>>>>> |     a      |     b      |   EXPR$2   |   EXPR$3   |   EXPR$4   |
>>>>>>>> +------------+------------+------------+------------+------------+
>>>>>>>> | r1cl1      | r1c2       | 1          | a string   | 1          |
>>>>>>>> | r2cl1      | r2c2       | 2          | a string   | 1          |
>>>>>>>> | r2cl1      | r2c2       | 2          | a string   | 2          |
>>>>>>>> | r3cl1      | r3c2       | 3          | a string   | 1          |
>>>>>>>> | r3cl1      | r3c2       | 3          | a string   | 2          |
>>>>>>>> | r3cl1      | r3c2       | 3          | a string   | 3          |
>>>>>>>> | r4cl1      | r4c2       | 4          | a string   | 1          |
>>>>>>>> | r4cl1      | r4c2       | 4          | a string   | 2          |
>>>>>>>> | r4cl1      | r4c2       | 4          | a string   | 3          |
>>>>>>>> | r4cl1      | r4c2       | 4          | a string   | 4          |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   | 1          |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   | 2          |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   | 3          |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   | 4          |
>>>>>>>> | r5cl1      | r5c2       | 5          | a string   | 5          |
>>>>>>>> +------------+------------+------------+------------+------------+
>>>>>>>> 15 rows selected (0.171 seconds)
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Sudhakar Thota
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Steven Phillips
>>>> Software Engineer
>>>> 
>>>> mapr.com
>>>> 
>> 
>>

Re: Drill - Flatten function - help please

Reply via email to