This is the basic problem with JSON and Schema Learning.. If all the values are null you don’t know what datatype the object is..
Forcing a CAST() is really the only option.. This also implies that you almost always should do a CAST() to INT or FLOAT if you except possible NULL values.. I think the best solution is to try to introduce: https://json-schema.org/ as an alternative to schema learning which is also expensive.. When type is JSON allow a JSON schema to be passed in as well... -----Original Message----- From: Aman Sinha <[email protected]> Sent: Tuesday, April 9, 2019 6:03 PM To: user <[email protected]> Subject: Re: Issue faced in Apache drill External Email: Use caution with links and attachments The last suggestion from Paul about CASTing to desired type should work: SELECT a, SUM(CAST(b as INT) ) FROM dfs.`C:\\Users\\user\\Desktop sample.json` group by a; I suggest filing a JIRA for the original query because for some reason if all values are NULLs, (and this is with group-by), Drill's aggregate function code generator is defaulting to a varchar specific function. Aman On Tue, Apr 9, 2019 at 4:58 PM Paul Rogers <[email protected]> wrote: > Hi Gayathri, > > If you only have the content shown, then note that your "b" columns > are always null. Drill has no way to know what type b is supposed to be. > Normally, Drill guesses Nullable Int. > > I wonder, have you turned on the "store.json.all_text_mode" session > option to tell JSON to read all columns as VarChar? If so, that would > explain why the error message says that b is VarChar. > > The team is in the process of adding a schema system to resolve this > kind of ambiguity: you'll be able to say, "b is a DOUBLE. Even if it > is all nulls, go ahead and treat it as double." > > In the mean time, you can try Rahul's suggestion. Or, you can ensure > that the b column has actual data. Or, you can try casting the (NULL > VarChar) b columns to the desired type. > > Thanks, > - Paul > > > > On Tuesday, April 9, 2019, 4:32:13 PM PDT, Gayathri Selvaraj < > [email protected]> wrote: > > Hi Team, > > > Facing some issues with the following case: > > Json file (*sample.json*) is having the following content: > {"a":2,"b":null} {"a":2,"b":null} {"a":3,"b":null} {"a":4,"b":null} > > *Query:* > > SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop sample.json` group > by a; > > *Error:* > > UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate > functions supported for VarChar type > > *Observation:* > > If we query without using group by, then it is working fine without > any error. If group by is used, then sum of null values is throwing > the above error. > > > > Can anyone please let us know the solution for this or if there are > any alternative. I have raised a JIRA ticket for the same - > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > _jira_browse_DRILL-2D7161&d=DwIBaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBT > ifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=IH3U7J4Wr70dhjl2Nht2IfigsUXlPIIa > b5nyYMJZ3wE&s=SQSZK0g-nvdnIa6wJIr-c8zYDmowCrLnkktKyrYXP1Y&e= > > > Regards, > > Gayathri > This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information. Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy. For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations. © 2019 BlackRock, Inc. All rights reserved.
