Re: "Already had POJO for id" Error In SQL - Join B/W Redshift and HDFS

Jacques Nadeau Wed, 06 Jan 2016 08:15:51 -0800

Can you turn on verbose errors and post the full stack trace of the error?
You can enable verbose errors per the instructions here:


https://drill.apache.org/docs/troubleshooting/#enable-verbose-errors

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Jan 6, 2016 at 6:10 AM, <[email protected]> wrote:

> Any thoughts on this? I tried so many variants of this query but same
> error!
>
> Thanks,
> Rohit
>
> > On 06-Jan-2016, at 12:26 AM, Rohit Kulkarni <[email protected]>
> wrote:
> >
> > Thanks a bunch for replying! I quickly ran this - the TAGS_US data in
> HDFS is in parquet format.
> >
> > select distinct typeof(cvalue)
> >
> > from hdfs.drill.TAGS_US;
> >
> > +----------+
> >
> > |  EXPR$0  |
> >
> > +----------+
> >
> > | VARCHAR  |
> >
> > +----------+
> >
> >
> >
> > Same with the table in Redshift. I changed my query to specifically cast
> the columns to VARCHAR again -
> >
> > select count(*)
> >
> > from redshift.reports.public.us_tags as a
> >
> > join hdfs.drill.TAGS_US as b
> >
> > on cast(b.cvalue as varchar) = cast(a.tag_value as varchar) ;
> >
> >
> > I see the same error again.
> >
> > Here is the explain plan for the query -
> >
> > select count(*)
> > from hdfs.drill.TAGS_US as a
> > join redshift.reports.public.us_tags as b
> > on a.cvalue = b.tag_value;
> >
> > Error: SYSTEM ERROR: IllegalStateException: Already had POJO for id
> (java.lang.Integer)
> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8]
> >
> >
> > +------+------+
> > | text | json |
> > +------+------+
> > | 00-00    Screen
> > 00-01      StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> > 00-02        UnionExchange
> > 01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
> > 01-02            Project($f0=[0])
> > 01-03              HashJoin(condition=[=($0, $1)], joinType=[inner])
> > 01-05                Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=hdfs://
> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US]],
> selectionRoot=hdfs://
> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US, numFiles=1,
> usedMetadataFile=false, columns=[`cvalue`]]])
> > 01-04                BroadcastExchange
> > 02-01                  Project(tag_value=[$2])
> > 02-02                    Jdbc(sql=[SELECT *
> > FROM "reports"."public"."us_tags"])
> >  | {
> >   "head" : {
> >     "version" : 1,
> >     "generator" : {
> >       "type" : "ExplainHandler",
> >       "info" : ""
> >     },
> >     "type" : "APACHE_DRILL_PHYSICAL",
> >     "options" : [ ],
> >     "queue" : 0,
> >     "resultMode" : "EXEC"
> >   },
> >   "graph" : [ {
> >     "pop" : "jdbc-scan",
> >     "@id" : 0,
> >     "sql" : "SELECT *\nFROM \"reports\".\"public\".\"us_tags\"",
> >     "config" : {
> >       "type" : "jdbc",
> >       "driver" : "com.amazon.redshift.jdbc4.Driver",
> >       "url" : "",
> >       "username" : "",
> >       "password" : "",
> >       "enabled" : true
> >     },
> >     "userName" : "",
> >     "cost" : 0.0
> >   }, {
> >     "pop" : "project",
> >     "@id" : 131073,
> >     "exprs" : [ {
> >       "ref" : "`tag_value`",
> >       "expr" : "`tag_value`"
> >     } ],
> >     "child" : 0,
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 100.0
> >   }, {
> >     "pop" : "broadcast-exchange",
> >     "@id" : 65540,
> >     "child" : 131073,
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 100.0
> >   }, {
> >     "pop" : "parquet-scan",
> >     "@id" : 65541,
> >     "userName" : "XXXX",
> >     "entries" : [ {
> >       "path" : "hdfs://
> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US"
> >     } ],
> >     "storage" : {
> >       "type" : "file",
> >       "enabled" : true,
> >       "connection" : "hdfs://
> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/",
> >       "workspaces" : {
> >         "root" : {
> >           "location" : "/",
> >           "writable" : true,
> >           "defaultInputFormat" : null
> >         },
> >         "tmp" : {
> >           "location" : "/tmp",
> >           "writable" : true,
> >           "defaultInputFormat" : null
> >         },
> >         "drill" : {
> >           "location" : "/drill",
> >           "writable" : true,
> >           "defaultInputFormat" : "tsv"
> >         },
> >         "drill2" : {
> >           "location" : "/drill",
> >           "writable" : true,
> >           "defaultInputFormat" : "csv"
> >         }
> >       },
> >       "formats" : {
> >         "psv" : {
> >           "type" : "text",
> >           "extensions" : [ "tbl" ],
> >           "delimiter" : "|"
> >         },
> >         "csv" : {
> >           "type" : "text",
> >           "extensions" : [ "csv" ],
> >           "delimiter" : ","
> >         },
> >         "tsv" : {
> >           "type" : "text",
> >           "extensions" : [ "tsv" ],
> >           "delimiter" : "\t"
> >         },
> >         "parquet" : {
> >           "type" : "parquet"
> >         },
> >         "json" : {
> >           "type" : "json"
> >         },
> >         "avro" : {
> >           "type" : "avro"
> >         },
> >         "sequencefile" : {
> >           "type" : "sequencefile",
> >           "extensions" : [ "seq" ]
> >         },
> >         "csvh" : {
> >           "type" : "text",
> >           "extensions" : [ "csvh" ],
> >           "extractHeader" : true,
> >           "delimiter" : ","
> >         }
> >       }
> >     },
> >     "format" : {
> >       "type" : "parquet"
> >     },
> >     "columns" : [ "`cvalue`" ],
> >     "selectionRoot" : "hdfs://
> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US",
> >     "fileSet" : [ "/drill/TAGS_US/0_0_1.parquet",
> "/drill/TAGS_US/0_0_0.parquet" ],
> >     "cost" : 4.1667342E7
> >   }, {
> >     "pop" : "hash-join",
> >     "@id" : 65539,
> >     "left" : 65541,
> >     "right" : 65540,
> >     "conditions" : [ {
> >       "relationship" : "EQUALS",
> >       "left" : "`cvalue`",
> >       "right" : "`tag_value`"
> >     } ],
> >     "joinType" : "INNER",
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 4.1667342E7
> >   }, {
> >     "pop" : "project",
> >     "@id" : 65538,
> >     "exprs" : [ {
> >       "ref" : "`$f0`",
> >       "expr" : "0"
> >     } ],
> >     "child" : 65539,
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 4.1667342E7
> >   }, {
> >     "pop" : "streaming-aggregate",
> >     "@id" : 65537,
> >     "child" : 65538,
> >     "keys" : [ ],
> >     "exprs" : [ {
> >       "ref" : "`EXPR$0`",
> >       "expr" : "count(1) "
> >     } ],
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 1.0
> >   }, {
> >     "pop" : "union-exchange",
> >     "@id" : 2,
> >     "child" : 65537,
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 1.0
> >   }, {
> >     "pop" : "streaming-aggregate",
> >     "@id" : 1,
> >     "child" : 2,
> >     "keys" : [ ],
> >     "exprs" : [ {
> >       "ref" : "`EXPR$0`",
> >       "expr" : "$sum0(`EXPR$0`) "
> >     } ],
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 1.0
> >   }, {
> >     "pop" : "screen",
> >     "@id" : 0,
> >     "child" : 1,
> >     "initialAllocation" : 1000000,
> >     "maxAllocation" : 10000000000,
> >     "cost" : 1.0
> >   } ]
> > } |
> > +------+------+
> >
> >
> >
> >
> >> On Mon, Jan 4, 2016 at 9:42 PM, Andries Engelbrecht <
> [email protected]> wrote:
> >> Perhaps check the data type of all the fields being used for the join.
> >>
> >> Select cvalue, TYPEOF(cvalue) from hdfs...... limit 10
> >>
> >> and similar for  tag_value on redshift.
> >>
> >> You can then do a predicate to find records where the data type may be
> different.
> >>  where typeof(<field>) not like '<data type of field>'
> >>
> >> I believe there was a nice write up on they topic, but can't find it
> now.
> >>
> >>
> >> --Andries
> >>
> >>
> >>
> >>
> >> > On Jan 3, 2016, at 8:45 PM, Rohit Kulkarni <[email protected]>
> wrote:
> >> >
> >> > Hello all,
> >> >
> >> > I am sure if not all of you, but some of you must have seen this
> error some
> >> > time -
> >> >
> >> > *Error: SYSTEM ERROR: IllegalStateException: Already had POJO for id
> >> > (java.lang.Integer)
> >> > [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8]*
> >> >
> >> > I am trying to do a join between Redshift (JDBC) and HDFS like this
> -
> >> >
> >> >
> >> >
> >> >
> >> > *select count(*)from hdfs.drill.TAGS_US as aright join
> >> > redshift.reports.public.us_tags as bon a.cvalue = b.tag_value;*
> >> >
> >> >
> >> > I don't see anything wrong in the query. The two individual tables
> return
> >> > proper data when fired a query separately. Is something missing or am
> I
> >> > doing something wrong?
> >> >
> >> > Would very much appreciate your help! Thanks!!
> >> >
> >> > --
> >> > Warm Regards,
> >> > Rohit Kulkarni
> >
> >
> >
> > --
> > Warm Regards,
> > Rohit Kulkarni
> > Mo.: +91 89394 63593
>

Re: "Already had POJO for id" Error In SQL - Join B/W Redshift and HDFS

Reply via email to