Oh this is brilliant. I will take a look at this and give it a try. Let me take a moment and thank you all for this ambitious project you've undertaken. Thanks a lot! :)
Regards, Rohit Sent from my iPhone > On 16-Jan-2016, at 4:11 AM, Jacques Nadeau <[email protected]> wrote: > > I have a fix and we should merge it shortly into master. You can see the > progress here: > > https://issues.apache.org/jira/browse/DRILL-4277 > > Note that given the simplicity of the patch, if you are adventurous, you > could most likely apply this patch on top of the 1.4 version of Drill if > you didn't want to wait until the next official release. > > Thanks for your patience. > > Jacques > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > >> On Thu, Jan 14, 2016 at 5:07 PM, Jacques Nadeau <[email protected]> wrote: >> >> Good catch. Reproduced now. Looking into it. >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >> On Thu, Jan 14, 2016 at 3:19 PM, Jason Altekruse <[email protected] >>> wrote: >> >>> Jacques, not sure if you caught this, in the stacktrace it mentions >>> broadcast sender. Did the plan for your test query include a broadcast >>> join? >>> >>> * (com.fasterxml.jackson.databind.JsonMappingException) Already had POJO >>> for id (java.lang.Integer) >>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8] >>> (through reference chain: >>> org.apache.drill.exec.physical.config.BroadcastSender["destinations"])* >>> >>> On Thu, Jan 14, 2016 at 2:02 PM, Jacques Nadeau <[email protected]> >>> wrote: >>> >>>> Hey Rohit, >>>> >>>> I'm having trouble reproducing this in my environment (pointing at >>> derby + >>>> hdfs instead of redshift/postgres). Can you turn on debug logging and >>> then >>>> run this query? You can enable the debug logging we are interested in by >>>> adding the following item to logback.xml: >>>> >>>> <logger >>>> name="org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler" >>>> additivity="false"> >>>> <level value="debug" /> >>>> <appender-ref ref="FILE" /> >>>> </logger> >>>> >>>> This is the query that I have succesfully completing. Please confirm it >>> is >>>> similar to your query: >>>> SELECT count(*) >>>> FROM dfs.`/data/tpch01/line/` a >>>> RIGHT JOIN derby.DRILL_DERBY_TEST.PERSON b >>>> ON a.cvalue = b.person_id >>>> >>>> >>>> >>>> -- >>>> Jacques Nadeau >>>> CTO and Co-Founder, Dremio >>>> >>>>> On Wed, Jan 13, 2016 at 7:36 PM, <[email protected]> wrote: >>>>> >>>>> Thanks a bunch Jacques! >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On 14-Jan-2016, at 12:48 AM, Jacques Nadeau <[email protected]> >>>> wrote: >>>>>> >>>>>> I think it is most likely probably trivial but unfortunately haven't >>>> had >>>>>> the time to look at it yet. It looks like, for some reason, we're >>>> having >>>>> a >>>>>> failure when serializing the query to pass around between nodes. >>> Let me >>>>> try >>>>>> to take a look today. >>>>>> >>>>>> -- >>>>>> Jacques Nadeau >>>>>> CTO and Co-Founder, Dremio >>>>>> >>>>>> On Wed, Jan 13, 2016 at 3:17 AM, Rohit Kulkarni < >>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Is this trivial, or big? >>>>>>> >>>>>>> Thanks, >>>>>>> Rohit >>>>>>> >>>>>>> On Thu, Jan 7, 2016 at 11:29 PM, Boris Chmiel < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi everyone,I do also have this error trying to join MSSQL data >>>> source >>>>>>>> with dfs parquet files. Here the stack : >>>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM >>>>>>>> ERROR:IllegalStateException: Already had POJO for id >>> (java.lang.Integer)[com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>>>>>> ] >>>>>>>> Fragment2:0 [Error Id: 8431453e-94cb-459a-bc6c-5b5508c7ff84 on >>>>>>>> PC-PC:31010](com.fasterxml.jackson.databind.JsonMappingException) >>>>> Already >>>>>>>> had POJO for >>> id(java.lang.Integer)[com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>>>>>> ] >>>>>>>> (throughreference chain: >>> org.apache.drill.exec.physical.config.BroadcastSender["destinations"])com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():210com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():177com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow():1420com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():351com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269com.fasterxml.jackson.databind.ObjectReader.readValue():896org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94org.apache.drill.exec.work.fragment.FragmentExecutor.run():227org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWorker():1145java.util.concurrent.ThreadPoolExecutor$Worker.run():615java.lang.Thread.run():745 >>>>>>>> Caused By (java.lang.IllegalStateException) Alreadyhad POJO for id >>> (java.lang.Integer)[com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>>>>>> ] >>> com.fasterxml.jackson.annotation.SimpleObjectIdResolver.bindItem():20com.fasterxml.jackson.databind.deser.impl.ReadableObjectId.bindItem():66com.fasterxml.jackson.databind.deser.impl.PropertyValueBuffer.handleIdValue():117com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build():169com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():349com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269com.fasterxml.jackson.databind.ObjectReader.readValue():896org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94org.apache.drill.exec.work.fragment.FragmentExecutor.run():227org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWorker():1145java.util.concurrent.ThreadPoolExecutor$Worker.run():615java.lang.Thread.run():745 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Le Jeudi 7 janvier 2016 17h08, Rohit Kulkarni < >>>>>>>> [email protected]> a écrit : >>>>>>>> >>>>>>>> >>>>>>>> Hi Jacques, >>>>>>>> >>>>>>>> Here is the full stack trace as you asked - >>>>>>>> >>>>>>>> *Error: SYSTEM ERROR: IllegalStateException: Already had POJO for >>> id >>>>>>>> (java.lang.Integer) >>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>> ]* >>>>>>>> >>>>>>>> *Fragment 2:0* >>>>>>>> >>>>>>>> *[Error Id: 57494209-04e8-4580-860d-461cf50b41f8 on >>>>>>>> ip-x-x-x-x.ec2.internal:31010]* >>>>>>>> >>>>>>>> * (com.fasterxml.jackson.databind.JsonMappingException) Already >>> had >>>>> POJO >>>>>>>> for id (java.lang.Integer) >>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>> ] >>>>>>>> (through reference chain: >>>> org.apache.drill.exec.physical.config.BroadcastSender["destinations"])* >>>>>>>> * >>>> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():210* >>>>>>>> * >>>> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():177* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow():1420* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():351* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126* >>>>>>>> * >>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113* >>>>>>>> * >>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41* >>>>>>>> * >>>> com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269* >>>>>>>> * com.fasterxml.jackson.databind.ObjectReader.readValue():896* >>>>>>>> * >>> org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94* >>>>>>>> * >>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227* >>>>>>>> * org.apache.drill.common.SelfCleaningRunnable.run():38* >>>>>>>> * java.util.concurrent.ThreadPoolExecutor.runWorker():1145* >>>>>>>> * java.util.concurrent.ThreadPoolExecutor$Worker.run():615* >>>>>>>> * java.lang.Thread.run():745* >>>>>>>> * Caused By (java.lang.IllegalStateException) Already had POJO >>> for >>>> id >>>>>>>> (java.lang.Integer) >>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>> ]* >>>>>>>> * >>> com.fasterxml.jackson.annotation.SimpleObjectIdResolver.bindItem():20* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.impl.ReadableObjectId.bindItem():66* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.impl.PropertyValueBuffer.handleIdValue():117* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build():169* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():349* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126* >>>>>>>> * >>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113* >>>>>>>> * >>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132* >>>>>>>> * >>> com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41* >>>>>>>> * >>>> com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269* >>>>>>>> * com.fasterxml.jackson.databind.ObjectReader.readValue():896* >>>>>>>> * >>> org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94* >>>>>>>> * >>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227* >>>>>>>> * org.apache.drill.common.SelfCleaningRunnable.run():38* >>>>>>>> * java.util.concurrent.ThreadPoolExecutor.runWorker():1145* >>>>>>>> * java.util.concurrent.ThreadPoolExecutor$Worker.run():615* >>>>>>>> * java.lang.Thread.run():745 (state=,code=0)* >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, Jan 6, 2016 at 9:44 PM, Jacques Nadeau < >>> [email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Can you turn on verbose errors and post the full stack trace of >>> the >>>>>>>> error? >>>>>>>>> You can enable verbose errors per the instructions here: >>>> https://drill.apache.org/docs/troubleshooting/#enable-verbose-errors >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jacques Nadeau >>>>>>>>> CTO and Co-Founder, Dremio >>>>>>>>> >>>>>>>>>> On Wed, Jan 6, 2016 at 6:10 AM, <[email protected]> >>> wrote: >>>>>>>>>> >>>>>>>>>> Any thoughts on this? I tried so many variants of this query but >>>> same >>>>>>>>>> error! >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Rohit >>>>>>>>>> >>>>>>>>>>> On 06-Jan-2016, at 12:26 AM, Rohit Kulkarni < >>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Thanks a bunch for replying! I quickly ran this - the TAGS_US >>> data >>>>>>> in >>>>>>>>>> HDFS is in parquet format. >>>>>>>>>>> >>>>>>>>>>> select distinct typeof(cvalue) >>>>>>>>>>> >>>>>>>>>>> from hdfs.drill.TAGS_US; >>>>>>>>>>> >>>>>>>>>>> +----------+ >>>>>>>>>>> >>>>>>>>>>> | EXPR$0 | >>>>>>>>>>> >>>>>>>>>>> +----------+ >>>>>>>>>>> >>>>>>>>>>> | VARCHAR | >>>>>>>>>>> >>>>>>>>>>> +----------+ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Same with the table in Redshift. I changed my query to >>>> specifically >>>>>>>>> cast >>>>>>>>>> the columns to VARCHAR again - >>>>>>>>>>> >>>>>>>>>>> select count(*) >>>>>>>>>>> >>>>>>>>>>> from redshift.reports.public.us_tags as a >>>>>>>>>>> >>>>>>>>>>> join hdfs.drill.TAGS_US as b >>>>>>>>>>> >>>>>>>>>>> on cast(b.cvalue as varchar) = cast(a.tag_value as varchar) ; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I see the same error again. >>>>>>>>>>> >>>>>>>>>>> Here is the explain plan for the query - >>>>>>>>>>> >>>>>>>>>>> select count(*) >>>>>>>>>>> from hdfs.drill.TAGS_US as a >>>>>>>>>>> join redshift.reports.public.us_tags as b >>>>>>>>>>> on a.cvalue = b.tag_value; >>>>>>>>>>> >>>>>>>>>>> Error: SYSTEM ERROR: IllegalStateException: Already had POJO >>> for >>>> id >>>>>>>>>> (java.lang.Integer) >>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>>> ] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> +------+------+ >>>>>>>>>>> | text | json | >>>>>>>>>>> +------+------+ >>>>>>>>>>> | 00-00 Screen >>>>>>>>>>> 00-01 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) >>>>>>>>>>> 00-02 UnionExchange >>>>>>>>>>> 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) >>>>>>>>>>> 01-02 Project($f0=[0]) >>>>>>>>>>> 01-03 HashJoin(condition=[=($0, $1)], >>>>>>> joinType=[inner]) >>>>>>>>>>> 01-05 Scan(groupscan=[ParquetGroupScan >>>>>>>>>> [entries=[ReadEntryWithPath [path=hdfs:// >>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US] >>> <http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US%5D>], >>>>>>>>>> selectionRoot=hdfs:// >>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US, >>>>>>>> numFiles=1, >>>>>>>>>> usedMetadataFile=false, columns=[`cvalue`]]]) >>>>>>>>>>> 01-04 BroadcastExchange >>>>>>>>>>> 02-01 Project(tag_value=[$2]) >>>>>>>>>>> 02-02 Jdbc(sql=[SELECT * >>>>>>>>>>> FROM "reports"."public"."us_tags"]) >>>>>>>>>>> | { >>>>>>>>>>> "head" : { >>>>>>>>>>> "version" : 1, >>>>>>>>>>> "generator" : { >>>>>>>>>>> "type" : "ExplainHandler", >>>>>>>>>>> "info" : "" >>>>>>>>>>> }, >>>>>>>>>>> "type" : "APACHE_DRILL_PHYSICAL", >>>>>>>>>>> "options" : [ ], >>>>>>>>>>> "queue" : 0, >>>>>>>>>>> "resultMode" : "EXEC" >>>>>>>>>>> }, >>>>>>>>>>> "graph" : [ { >>>>>>>>>>> "pop" : "jdbc-scan", >>>>>>>>>>> "@id" : 0, >>>>>>>>>>> "sql" : "SELECT *\nFROM \"reports\".\"public\".\"us_tags\"", >>>>>>>>>>> "config" : { >>>>>>>>>>> "type" : "jdbc", >>>>>>>>>>> "driver" : "com.amazon.redshift.jdbc4.Driver", >>>>>>>>>>> "url" : "", >>>>>>>>>>> "username" : "", >>>>>>>>>>> "password" : "", >>>>>>>>>>> "enabled" : true >>>>>>>>>>> }, >>>>>>>>>>> "userName" : "", >>>>>>>>>>> "cost" : 0.0 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "project", >>>>>>>>>>> "@id" : 131073, >>>>>>>>>>> "exprs" : [ { >>>>>>>>>>> "ref" : "`tag_value`", >>>>>>>>>>> "expr" : "`tag_value`" >>>>>>>>>>> } ], >>>>>>>>>>> "child" : 0, >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 100.0 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "broadcast-exchange", >>>>>>>>>>> "@id" : 65540, >>>>>>>>>>> "child" : 131073, >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 100.0 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "parquet-scan", >>>>>>>>>>> "@id" : 65541, >>>>>>>>>>> "userName" : "XXXX", >>>>>>>>>>> "entries" : [ { >>>>>>>>>>> "path" : "hdfs:// >>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US" >>>>>>>>>>> } ], >>>>>>>>>>> "storage" : { >>>>>>>>>>> "type" : "file", >>>>>>>>>>> "enabled" : true, >>>>>>>>>>> "connection" : "hdfs:// >>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/", >>>>>>>>>>> "workspaces" : { >>>>>>>>>>> "root" : { >>>>>>>>>>> "location" : "/", >>>>>>>>>>> "writable" : true, >>>>>>>>>>> "defaultInputFormat" : null >>>>>>>>>>> }, >>>>>>>>>>> "tmp" : { >>>>>>>>>>> "location" : "/tmp", >>>>>>>>>>> "writable" : true, >>>>>>>>>>> "defaultInputFormat" : null >>>>>>>>>>> }, >>>>>>>>>>> "drill" : { >>>>>>>>>>> "location" : "/drill", >>>>>>>>>>> "writable" : true, >>>>>>>>>>> "defaultInputFormat" : "tsv" >>>>>>>>>>> }, >>>>>>>>>>> "drill2" : { >>>>>>>>>>> "location" : "/drill", >>>>>>>>>>> "writable" : true, >>>>>>>>>>> "defaultInputFormat" : "csv" >>>>>>>>>>> } >>>>>>>>>>> }, >>>>>>>>>>> "formats" : { >>>>>>>>>>> "psv" : { >>>>>>>>>>> "type" : "text", >>>>>>>>>>> "extensions" : [ "tbl" ], >>>>>>>>>>> "delimiter" : "|" >>>>>>>>>>> }, >>>>>>>>>>> "csv" : { >>>>>>>>>>> "type" : "text", >>>>>>>>>>> "extensions" : [ "csv" ], >>>>>>>>>>> "delimiter" : "," >>>>>>>>>>> }, >>>>>>>>>>> "tsv" : { >>>>>>>>>>> "type" : "text", >>>>>>>>>>> "extensions" : [ "tsv" ], >>>>>>>>>>> "delimiter" : "\t" >>>>>>>>>>> }, >>>>>>>>>>> "parquet" : { >>>>>>>>>>> "type" : "parquet" >>>>>>>>>>> }, >>>>>>>>>>> "json" : { >>>>>>>>>>> "type" : "json" >>>>>>>>>>> }, >>>>>>>>>>> "avro" : { >>>>>>>>>>> "type" : "avro" >>>>>>>>>>> }, >>>>>>>>>>> "sequencefile" : { >>>>>>>>>>> "type" : "sequencefile", >>>>>>>>>>> "extensions" : [ "seq" ] >>>>>>>>>>> }, >>>>>>>>>>> "csvh" : { >>>>>>>>>>> "type" : "text", >>>>>>>>>>> "extensions" : [ "csvh" ], >>>>>>>>>>> "extractHeader" : true, >>>>>>>>>>> "delimiter" : "," >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> }, >>>>>>>>>>> "format" : { >>>>>>>>>>> "type" : "parquet" >>>>>>>>>>> }, >>>>>>>>>>> "columns" : [ "`cvalue`" ], >>>>>>>>>>> "selectionRoot" : "hdfs:// >>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US", >>>>>>>>>>> "fileSet" : [ "/drill/TAGS_US/0_0_1.parquet", >>>>>>>>>> "/drill/TAGS_US/0_0_0.parquet" ], >>>>>>>>>>> "cost" : 4.1667342E7 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "hash-join", >>>>>>>>>>> "@id" : 65539, >>>>>>>>>>> "left" : 65541, >>>>>>>>>>> "right" : 65540, >>>>>>>>>>> "conditions" : [ { >>>>>>>>>>> "relationship" : "EQUALS", >>>>>>>>>>> "left" : "`cvalue`", >>>>>>>>>>> "right" : "`tag_value`" >>>>>>>>>>> } ], >>>>>>>>>>> "joinType" : "INNER", >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 4.1667342E7 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "project", >>>>>>>>>>> "@id" : 65538, >>>>>>>>>>> "exprs" : [ { >>>>>>>>>>> "ref" : "`$f0`", >>>>>>>>>>> "expr" : "0" >>>>>>>>>>> } ], >>>>>>>>>>> "child" : 65539, >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 4.1667342E7 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "streaming-aggregate", >>>>>>>>>>> "@id" : 65537, >>>>>>>>>>> "child" : 65538, >>>>>>>>>>> "keys" : [ ], >>>>>>>>>>> "exprs" : [ { >>>>>>>>>>> "ref" : "`EXPR$0`", >>>>>>>>>>> "expr" : "count(1) " >>>>>>>>>>> } ], >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 1.0 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "union-exchange", >>>>>>>>>>> "@id" : 2, >>>>>>>>>>> "child" : 65537, >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 1.0 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "streaming-aggregate", >>>>>>>>>>> "@id" : 1, >>>>>>>>>>> "child" : 2, >>>>>>>>>>> "keys" : [ ], >>>>>>>>>>> "exprs" : [ { >>>>>>>>>>> "ref" : "`EXPR$0`", >>>>>>>>>>> "expr" : "$sum0(`EXPR$0`) " >>>>>>>>>>> } ], >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 1.0 >>>>>>>>>>> }, { >>>>>>>>>>> "pop" : "screen", >>>>>>>>>>> "@id" : 0, >>>>>>>>>>> "child" : 1, >>>>>>>>>>> "initialAllocation" : 1000000, >>>>>>>>>>> "maxAllocation" : 10000000000, >>>>>>>>>>> "cost" : 1.0 >>>>>>>>>>> } ] >>>>>>>>>>> } | >>>>>>>>>>> +------+------+ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Mon, Jan 4, 2016 at 9:42 PM, Andries Engelbrecht < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> Perhaps check the data type of all the fields being used for >>> the >>>>>>>> join. >>>>>>>>>>>> >>>>>>>>>>>> Select cvalue, TYPEOF(cvalue) from hdfs...... limit 10 >>>>>>>>>>>> >>>>>>>>>>>> and similar for tag_value on redshift. >>>>>>>>>>>> >>>>>>>>>>>> You can then do a predicate to find records where the data >>> type >>>>>>> may >>>>>>>> be >>>>>>>>>> different. >>>>>>>>>>>> where typeof(<field>) not like '<data type of field>' >>>>>>>>>>>> >>>>>>>>>>>> I believe there was a nice write up on they topic, but can't >>> find >>>>>>> it >>>>>>>>>> now. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> --Andries >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Jan 3, 2016, at 8:45 PM, Rohit Kulkarni < >>>>>>>>> [email protected]> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello all, >>>>>>>>>>>>> >>>>>>>>>>>>> I am sure if not all of you, but some of you must have seen >>> this >>>>>>>>>> error some >>>>>>>>>>>>> time - >>>>>>>>>>>>> >>>>>>>>>>>>> *Error: SYSTEM ERROR: IllegalStateException: Already had POJO >>>>>>> for >>>>>>>> id >>>>>>>>>>>>> (java.lang.Integer) >>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8 >>>>>>>>> ]* >>>>>>>>>>>>> >>>>>>>>>>>>> I am trying to do a join between Redshift (JDBC) and HDFS >>> like >>>>>>>>> this >>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *select count(*)from hdfs.drill.TAGS_US as aright join >>>>>>>>>>>>> redshift.reports.public.us_tags as bon a.cvalue = >>> b.tag_value;* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I don't see anything wrong in the query. The two individual >>>>>>> tables >>>>>>>>>> return >>>>>>>>>>>>> proper data when fired a query separately. Is something >>> missing >>>>>>> or >>>>>>>>> am >>>>>>>>>> I >>>>>>>>>>>>> doing something wrong? >>>>>>>>>>>>> >>>>>>>>>>>>> Would very much appreciate your help! Thanks!! >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Warm Regards, >>>>>>>>>>>>> Rohit Kulkarni >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Warm Regards, >>>>>>>>>>> Rohit Kulkarni >>>>>>>>>>> Mo.: +91 89394 63593 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Warm Regards, >>>>>>>> Rohit Kulkarni >>>>>>>> Mo.: +91 89394 63593 >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Warm Regards, >>>>>>> Rohit Kulkarni >>>>>>> Mo.: +91 89394 63593 >> >>
