Looks like the COALESCE function is the source of the problem. Passing a double (0.0) instead of an int (0) as a second expression solved the problem for me:
CAST(COALESCE(t_total, 0.0) AS double) On Fri, Mar 11, 2016 at 12:45 AM, Matt <[email protected]> wrote: > ~~~ > 00-01 Project(date_tm=[CAST($23):TIMESTAMP(0)], > id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE > "ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET > "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], > id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE > "ISO-8859-1$en_US$primary"], b_total=[CAST(CASE(IS NOT NULL($4), $4, > 0)):BIGINT], t_total=[CAST(CASE(IS NOT NULL($31), $31, 0)):DOUBLE], > h_total=[CAST(CASE(IS NOT NULL($40), $40, 0)):BIGINT], > b_small=[CAST(CASE(IS NOT NULL($36), $36, 0)):BIGINT], > t_small=[CAST(CASE(IS NOT NULL($14), $14, 0)):DOUBLE], > h_small=[CAST(CASE(IS NOT NULL($38), $38, 0)):BIGINT], > b_18000=[CAST(CASE(IS NOT NULL($32), $32, 0)):BIGINT], > t_18000=[CAST(CASE(IS NOT NULL($24), $24, 0)):DOUBLE], > h_18000=[CAST(CASE(IS NOT NULL($27), $27, 0)):BIGINT], > b_12000=[CAST(CASE(IS NOT NULL($30), $30, 0)):BIGINT], > t_12000=[CAST(CASE(IS NOT NULL($28), $28, 0)):DOUBLE], > h_12000=[CAST(CASE(IS NOT NULL($20), $20, 0)):BIGINT], b_6000=[CAST(CASE(IS > NOT NULL($41), $41, 0)):BIGINT], t_6000=[CAST(CASE(IS NOT NULL($37), $37, > 0)):DOUBLE], h_6000=[CAST(CASE(IS NOT NULL($29), $29, 0)):BIGINT], > b_3000=[CAST(CASE(IS NOT NULL($17), $17, 0)):BIGINT], t_3000=[CAST(CASE(IS > NOT NULL($7), $7, 0)):DOUBLE], h_3000=[CAST(CASE(IS NOT NULL($1), $1, > 0)):BIGINT], b_2000=[CAST(CASE(IS NOT NULL($26), $26, 0)):BIGINT], > t_2000=[CAST(CASE(IS NOT NULL($34), $34, 0)):DOUBLE], h_2000=[CAST(CASE(IS > NOT NULL($10), $10, 0)):BIGINT], b_1500=[CAST(CASE(IS NOT NULL($42), $42, > 0)):BIGINT], t_1500=[CAST(CASE(IS NOT NULL($13), $13, 0)):DOUBLE], > h_1500=[CAST(CASE(IS NOT NULL($3), $3, 0)):BIGINT], b_1250=[CAST(CASE(IS > NOT NULL($21), $21, 0)):BIGINT], t_1250=[CAST(CASE(IS NOT NULL($25), $25, > 0)):DOUBLE], h_1250=[CAST(CASE(IS NOT NULL($16), $16, 0)):BIGINT], > b_1000=[CAST(CASE(IS NOT NULL($12), $12, 0)):BIGINT], t_1000=[CAST(CASE(IS > NOT NULL($19), $19, 0)):DOUBLE], h_1000=[CAST(CASE(IS NOT NULL($6), $6, > 0)):BIGINT], b_750=[CAST(CASE(IS NOT NULL($9), $9, 0)):BIGINT], > t_750=[CAST(CASE(IS NOT NULL($0), $0, 0)):DOUBLE], h_750=[CAST(CASE(IS NOT > NULL($5), $5, 0)):BIGINT], b_500=[CAST(CASE(IS NOT NULL($2), $2, > 0)):BIGINT], t_500=[CAST(CASE(IS NOT NULL($8), $8, 0)):DOUBLE], > h_500=[CAST(CASE(IS NOT NULL($39), $39, 0)):BIGINT], b_0=[CAST(CASE(IS NOT > NULL($18), $18, 0)):BIGINT], t_0=[CAST(CASE(IS NOT NULL($35), $35, > 0)):DOUBLE], EXPR$42=[CAST(CASE(IS NOT NULL($22), $22, 0)):BIGINT]) > 00-02 Scan(groupscan=[EasyGroupScan > [selectionRoot=hdfs://es05:54310/caspr/csv/smalltest.csv, numFiles=1, > columns=[`date_tm`, `id_1`, `id_2`, `id_3`, `b_total`, `t_total`, > `h_total`, `b_small`, `t_small`, `h_small`, `b_18000`, `t_18000`, > `h_18000`, `b_12000`, `t_12000`, `h_12000`, `b_6000`, `t_6000`, `h_6000`, > `b_3000`, `t_3000`, `h_3000`, `b_2000`, `t_2000`, `h_2000`, `b_1500`, > `t_1500`, `h_1500`, `b_1250`, `t_1250`, `h_1250`, `b_1000`, `t_1000`, > `h_1000`, `b_750`, `t_750`, `h_750`, `b_500`, `t_500`, `h_500`, `b_0`, > `t_0`, `h_0`], files=[hdfs://es05:54310/caspr/csv/smalltest.csv]]]) > ~~~ > > ~~~ > { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ { > "kind" : "STRING", > "type" : "SESSION", > "name" : "store.format", > "string_val" : "parquet" > }, { > "kind" : "BOOLEAN", > "type" : "SESSION", > "name" : "exec.errors.verbose", > "bool_val" : true > } ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "fs-scan", > "@id" : 2, > "userName" : "hduser", > "files" : [ "hdfs://es05:54310/caspr/csv/smalltest.csv" ], > "storage" : { > "type" : "file", > "enabled" : true, > "connection" : "hdfs://es05:54310", > "workspaces" : { > "root" : { > "location" : "/", > "writable" : true, > "defaultInputFormat" : null > }, > "tmp" : { > "location" : "/tmp/", > "writable" : true, > "defaultInputFormat" : null > }, > "caspr" : { > "location" : "/caspr", > "writable" : true, > "defaultInputFormat" : "csv" > } > "ref" : "`t_3000`", > "expr" : "cast( ( ( if (isnotnull(`t_3000`) ) then (`t_3000` ) > else (0 ) end ) ) as FLOAT8 )" > }, { > "ref" : "`h_3000`", > "expr" : "cast( ( ( if (isnotnull(`h_3000`) ) then (`h_3000` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`b_2000`", > "expr" : "cast( ( ( if (isnotnull(`b_2000`) ) then (`b_2000` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`t_2000`", > "expr" : "cast( ( ( if (isnotnull(`t_2000`) ) then (`t_2000` ) > else (0 ) end ) ) as FLOAT8 )" > }, { > "ref" : "`h_2000`", > "expr" : "cast( ( ( if (isnotnull(`h_2000`) ) then (`h_2000` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`b_1500`", > "expr" : "cast( ( ( if (isnotnull(`b_1500`) ) then (`b_1500` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`t_1500`", > "expr" : "cast( ( ( if (isnotnull(`t_1500`) ) then (`t_1500` ) > else (0 ) end ) ) as FLOAT8 )" > }, { > "ref" : "`h_1500`", > "expr" : "cast( ( ( if (isnotnull(`h_1500`) ) then (`h_1500` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`b_1250`", > "expr" : "cast( ( ( if (isnotnull(`b_1250`) ) then (`b_1250` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`t_1250`", > "expr" : "cast( ( ( if (isnotnull(`t_1250`) ) then (`t_1250` ) > else (0 ) end ) ) as FLOAT8 )" > }, { > "ref" : "`h_1250`", > "expr" : "cast( ( ( if (isnotnull(`h_1250`) ) then (`h_1250` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`b_1000`", > "expr" : "cast( ( ( if (isnotnull(`b_1000`) ) then (`b_1000` ) > else (0 ) end ) ) as BIGINT )" > }, { > "ref" : "`t_1000`", > "expr" : "cast( ( ( if (isnotnull(`t_1000`) ) then (`t_1000` ) > else (0 ) end ) ) as FLOAT8 )" > } > ~~~ > > > On 10 Mar 2016, at 18:40, Hsuan Yi Chu wrote: > > From the log, I see this one: >> >> StringFunctionHelpers.varCharToInt(): >> >> It seems like there might be an implicit casting (to integer) be inserted >> into your expression. >> >> Can you do EXPLAIN PLAN FOR "your query" (not double quoted needed) >> >> and share what you see? >> >> On Thu, Mar 10, 2016 at 3:11 PM, Matt <[email protected]> wrote: >> >> TRIM() did not help, and I visually verified there is no whitespace around >>> the number: The entire test data file: >>> >>> ~~~ >>> >>> >>> date_tm,id_1,id_2,id_3,b_total,t_total,h_total,b_small,t_small,h_small,b_18000,t_18000,h_18000,b_12000,t_12000,h_12000,b_6000,t_6000,h_6000,b_3000,t_3000,h_3000,b_2000,t_2000,h_2000,b_1500,t_1500,h_1500,b_1250,t_1250,h_1250,b_1000,t_1000,h_1000,b_750,t_750,h_750,b_500,t_500,h_500,b_0,t_0,h_0 >>> 2015-10-17 >>> >>> 00:00:00,f5e9v8u2,err,mi1,268918254,140.755,23519,268918254,140.755,23519,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, >>> ~~~ >>> >>> Including a trim resulted in the same: >>> >>> ~~~ >>> Error: SYSTEM ERROR: NumberFormatException: 140.755 >>> >>> Fragment 0:0 >>> >>> [Error Id: 01d368ee-6c06-476c-a553-5eb5baea7c3f on es07:31010] >>> >>> (java.lang.NumberFormatException) 140.755 >>> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95 >>> >>> >>> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt():120 >>> org.apache.drill.exec.test.generated.ProjectorGen122.doEval():467 >>> >>> org.apache.drill.exec.test.generated.ProjectorGen122.projectRecords():62 >>> >>> >>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():175 >>> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 >>> >>> >>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132 >>> org.apache.drill.exec.record.AbstractRecordBatch.next():162 >>> org.apache.drill.exec.physical.impl.BaseRootExec.next():104 >>> >>> >>> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 >>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94 >>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256 >>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250 >>> java.security.AccessController.doPrivileged():-2 >>> javax.security.auth.Subject.doAs():415 >>> org.apache.hadoop.security.UserGroupInformation.doAs():1657 >>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250 >>> org.apache.drill.common.SelfCleaningRunnable.run():38 >>> java.util.concurrent.ThreadPoolExecutor.runWorker():1145 >>> java.util.concurrent.ThreadPoolExecutor$Worker.run():615 >>> java.lang.Thread.run():745 (state=,code=0) >>> ~~~ >>> >>> On 10 Mar 2016, at 17:49, Jason Altekruse wrote: >>> >>> Could you try throwing a trim() in before the cast? Might just be >>> >>>> whitespace. >>>> >>>> There shouldn't be a need to add the COALESCE statement, as we only ever >>>> read defined values (which may be empty string) out of a CSV file. You >>>> should instead write out a full case statement that checks for empty >>>> string >>>> and provides your default value of 0 in that case. >>>> >>>> - Jason >>>> >>>> Jason Altekruse >>>> Software Engineer at Dremio >>>> Apache Drill Committer >>>> >>>> On Thu, Mar 10, 2016 at 2:32 PM, Matt <[email protected]> wrote: >>>> >>>> Have some CSV data that Drill 1.5 selects as-is without any problems, >>>> >>>>> until I attempt to CAST columns in a CTAS or plain SELECT: >>>>> >>>>> Error: SYSTEM ERROR: NumberFormatException: 140.755 >>>>> >>>>> The data is unquoted CSV, and column in question does have the value >>>>> "140.755" (unquoted). As the column can be empty, I am using the >>>>> following >>>>> transform: >>>>> >>>>> CAST(COALESCE(t_total, 0) AS double) AS t_total >>>>> >>>>> And on the first data row: >>>>> >>>>> Fragment 1:0 >>>>> >>>>> [Error Id: 4b4d83cf-a87d-451e-9cfa-4280e5adf64f on es08:31010] >>>>> >>>>> (java.lang.NumberFormatException) 140.755 >>>>> >>>>> What could be causing this error? >>>>> >>>>> >>>>> >>>>> -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
