Any thoughts on this? I tried so many variants of this query but same error!
Thanks, Rohit > On 06-Jan-2016, at 12:26 AM, Rohit Kulkarni <[email protected]> wrote: > > Thanks a bunch for replying! I quickly ran this - the TAGS_US data in HDFS is > in parquet format. > > select distinct typeof(cvalue) > > from hdfs.drill.TAGS_US; > > +----------+ > > | EXPR$0 | > > +----------+ > > | VARCHAR | > > +----------+ > > > > Same with the table in Redshift. I changed my query to specifically cast the > columns to VARCHAR again - > > select count(*) > > from redshift.reports.public.us_tags as a > > join hdfs.drill.TAGS_US as b > > on cast(b.cvalue as varchar) = cast(a.tag_value as varchar) ; > > > I see the same error again. > > Here is the explain plan for the query - > > select count(*) > from hdfs.drill.TAGS_US as a > join redshift.reports.public.us_tags as b > on a.cvalue = b.tag_value; > > Error: SYSTEM ERROR: IllegalStateException: Already had POJO for id > (java.lang.Integer) > [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8] > > > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) > 00-02 UnionExchange > 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 01-02 Project($f0=[0]) > 01-03 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 01-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=hdfs://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US]], > selectionRoot=hdfs://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US, > numFiles=1, usedMetadataFile=false, columns=[`cvalue`]]]) > 01-04 BroadcastExchange > 02-01 Project(tag_value=[$2]) > 02-02 Jdbc(sql=[SELECT * > FROM "reports"."public"."us_tags"]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "jdbc-scan", > "@id" : 0, > "sql" : "SELECT *\nFROM \"reports\".\"public\".\"us_tags\"", > "config" : { > "type" : "jdbc", > "driver" : "com.amazon.redshift.jdbc4.Driver", > "url" : "", > "username" : "", > "password" : "", > "enabled" : true > }, > "userName" : "", > "cost" : 0.0 > }, { > "pop" : "project", > "@id" : 131073, > "exprs" : [ { > "ref" : "`tag_value`", > "expr" : "`tag_value`" > } ], > "child" : 0, > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 100.0 > }, { > "pop" : "broadcast-exchange", > "@id" : 65540, > "child" : 131073, > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 100.0 > }, { > "pop" : "parquet-scan", > "@id" : 65541, > "userName" : "XXXX", > "entries" : [ { > "path" : > "hdfs://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US" > } ], > "storage" : { > "type" : "file", > "enabled" : true, > "connection" : "hdfs://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/", > "workspaces" : { > "root" : { > "location" : "/", > "writable" : true, > "defaultInputFormat" : null > }, > "tmp" : { > "location" : "/tmp", > "writable" : true, > "defaultInputFormat" : null > }, > "drill" : { > "location" : "/drill", > "writable" : true, > "defaultInputFormat" : "tsv" > }, > "drill2" : { > "location" : "/drill", > "writable" : true, > "defaultInputFormat" : "csv" > } > }, > "formats" : { > "psv" : { > "type" : "text", > "extensions" : [ "tbl" ], > "delimiter" : "|" > }, > "csv" : { > "type" : "text", > "extensions" : [ "csv" ], > "delimiter" : "," > }, > "tsv" : { > "type" : "text", > "extensions" : [ "tsv" ], > "delimiter" : "\t" > }, > "parquet" : { > "type" : "parquet" > }, > "json" : { > "type" : "json" > }, > "avro" : { > "type" : "avro" > }, > "sequencefile" : { > "type" : "sequencefile", > "extensions" : [ "seq" ] > }, > "csvh" : { > "type" : "text", > "extensions" : [ "csvh" ], > "extractHeader" : true, > "delimiter" : "," > } > } > }, > "format" : { > "type" : "parquet" > }, > "columns" : [ "`cvalue`" ], > "selectionRoot" : > "hdfs://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US", > "fileSet" : [ "/drill/TAGS_US/0_0_1.parquet", > "/drill/TAGS_US/0_0_0.parquet" ], > "cost" : 4.1667342E7 > }, { > "pop" : "hash-join", > "@id" : 65539, > "left" : 65541, > "right" : 65540, > "conditions" : [ { > "relationship" : "EQUALS", > "left" : "`cvalue`", > "right" : "`tag_value`" > } ], > "joinType" : "INNER", > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 4.1667342E7 > }, { > "pop" : "project", > "@id" : 65538, > "exprs" : [ { > "ref" : "`$f0`", > "expr" : "0" > } ], > "child" : 65539, > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 4.1667342E7 > }, { > "pop" : "streaming-aggregate", > "@id" : 65537, > "child" : 65538, > "keys" : [ ], > "exprs" : [ { > "ref" : "`EXPR$0`", > "expr" : "count(1) " > } ], > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 1.0 > }, { > "pop" : "union-exchange", > "@id" : 2, > "child" : 65537, > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 1.0 > }, { > "pop" : "streaming-aggregate", > "@id" : 1, > "child" : 2, > "keys" : [ ], > "exprs" : [ { > "ref" : "`EXPR$0`", > "expr" : "$sum0(`EXPR$0`) " > } ], > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 1.0 > }, { > "pop" : "screen", > "@id" : 0, > "child" : 1, > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 1.0 > } ] > } | > +------+------+ > > > > >> On Mon, Jan 4, 2016 at 9:42 PM, Andries Engelbrecht >> <[email protected]> wrote: >> Perhaps check the data type of all the fields being used for the join. >> >> Select cvalue, TYPEOF(cvalue) from hdfs...... limit 10 >> >> and similar for tag_value on redshift. >> >> You can then do a predicate to find records where the data type may be >> different. >> where typeof(<field>) not like '<data type of field>' >> >> I believe there was a nice write up on they topic, but can't find it now. >> >> >> --Andries >> >> >> >> >> > On Jan 3, 2016, at 8:45 PM, Rohit Kulkarni <[email protected]> >> > wrote: >> > >> > Hello all, >> > >> > I am sure if not all of you, but some of you must have seen this error some >> > time - >> > >> > *Error: SYSTEM ERROR: IllegalStateException: Already had POJO for id >> > (java.lang.Integer) >> > [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8]* >> > >> > I am trying to do a join between Redshift (JDBC) and HDFS like this - >> > >> > >> > >> > >> > *select count(*)from hdfs.drill.TAGS_US as aright join >> > redshift.reports.public.us_tags as bon a.cvalue = b.tag_value;* >> > >> > >> > I don't see anything wrong in the query. The two individual tables return >> > proper data when fired a query separately. Is something missing or am I >> > doing something wrong? >> > >> > Would very much appreciate your help! Thanks!! >> > >> > -- >> > Warm Regards, >> > Rohit Kulkarni > > > > -- > Warm Regards, > Rohit Kulkarni > Mo.: +91 89394 63593
