This is the basic problem with JSON and Schema Learning..

If all the values are null you don’t know what datatype the object is..

Forcing a CAST() is really the only option.. This also implies that you almost 
always should do a CAST() to INT or FLOAT if you except possible NULL values..

I think the best solution is to try to introduce: https://json-schema.org/ as 
an alternative to schema learning which is also expensive..

When type is JSON allow a JSON schema to be passed in as well...

-----Original Message-----
From: Aman Sinha <[email protected]> 
Sent: Tuesday, April 9, 2019 6:03 PM
To: user <[email protected]>
Subject: Re: Issue faced in Apache drill

External Email: Use caution with links and attachments


The last suggestion from Paul about CASTing to desired type should work:
   SELECT a, SUM(CAST(b as INT) ) FROM dfs.`C:\\Users\\user\\Desktop  
sample.json` group by a;

I suggest filing a JIRA for the original query because for some reason if all 
values are NULLs,  (and this is with group-by), Drill's aggregate function code 
generator is defaulting to a varchar specific function.

Aman

On Tue, Apr 9, 2019 at 4:58 PM Paul Rogers <[email protected]>
wrote:

> Hi Gayathri,
>
> If you only have the content shown, then note that your "b" columns 
> are always null. Drill has no way to know what type b is supposed to be.
> Normally, Drill guesses Nullable Int.
>
> I wonder, have you turned on the "store.json.all_text_mode" session 
> option to tell JSON to read all columns as VarChar? If so, that would 
> explain why the error message says that b is VarChar.
>
> The team is in the process of adding a schema system to resolve this 
> kind of ambiguity: you'll be able to say, "b is a DOUBLE. Even if it 
> is all nulls, go ahead and treat it as double."
>
> In the mean time, you can try Rahul's suggestion. Or, you can ensure 
> that the b column has actual data. Or, you can try casting the (NULL 
> VarChar) b columns to the desired type.
>
> Thanks,
> - Paul
>
>
>
>     On Tuesday, April 9, 2019, 4:32:13 PM PDT, Gayathri Selvaraj < 
> [email protected]> wrote:
>
>  Hi Team,
>
>
> Facing some issues with the following case:
>
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null}
>
> *Query:*
>
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop sample.json` group 
> by a;
>
> *Error:*
>
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate 
> functions supported for VarChar type
>
> *Observation:*
>
> If we query without using group by, then it is working fine without 
> any error. If group by is used, then sum of null values is throwing 
> the above error.
>
>
>
> Can anyone please let us know the solution for this or if there are 
> any alternative. I have raised a JIRA ticket for the same - 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_DRILL-2D7161&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBT
> ifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=IH3U7J4Wr70dhjl2Nht2IfigsUXlPIIa
> b5nyYMJZ3wE&s=SQSZK0g-nvdnIa6wJIr-c8zYDmowCrLnkktKyrYXP1Y&e=
>
>
> Regards,
>
> Gayathri
>


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.

For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.

Reply via email to