Great ! Thanks Jacques; I expect to try your patch soon
Le Samedi 16 janvier 2016 2h36, "[email protected]"
<[email protected]> a écrit :
Oh this is brilliant. I will take a look at this and give it a try.
Let me take a moment and thank you all for this ambitious project you've
undertaken. Thanks a lot! :)
Regards,
Rohit
Sent from my iPhone
> On 16-Jan-2016, at 4:11 AM, Jacques Nadeau <[email protected]> wrote:
>
> I have a fix and we should merge it shortly into master. You can see the
> progress here:
>
> https://issues.apache.org/jira/browse/DRILL-4277
>
> Note that given the simplicity of the patch, if you are adventurous, you
> could most likely apply this patch on top of the 1.4 version of Drill if
> you didn't want to wait until the next official release.
>
> Thanks for your patience.
>
> Jacques
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
>> On Thu, Jan 14, 2016 at 5:07 PM, Jacques Nadeau <[email protected]> wrote:
>>
>> Good catch. Reproduced now. Looking into it.
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Thu, Jan 14, 2016 at 3:19 PM, Jason Altekruse <[email protected]
>>> wrote:
>>
>>> Jacques, not sure if you caught this, in the stacktrace it mentions
>>> broadcast sender. Did the plan for your test query include a broadcast
>>> join?
>>>
>>> * (com.fasterxml.jackson.databind.JsonMappingException) Already had POJO
>>> for id (java.lang.Integer)
>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8]
>>> (through reference chain:
>>> org.apache.drill.exec.physical.config.BroadcastSender["destinations"])*
>>>
>>> On Thu, Jan 14, 2016 at 2:02 PM, Jacques Nadeau <[email protected]>
>>> wrote:
>>>
>>>> Hey Rohit,
>>>>
>>>> I'm having trouble reproducing this in my environment (pointing at
>>> derby +
>>>> hdfs instead of redshift/postgres). Can you turn on debug logging and
>>> then
>>>> run this query? You can enable the debug logging we are interested in by
>>>> adding the following item to logback.xml:
>>>>
>>>> <logger
>>>> name="org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler"
>>>> additivity="false">
>>>> <level value="debug" />
>>>> <appender-ref ref="FILE" />
>>>> </logger>
>>>>
>>>> This is the query that I have succesfully completing. Please confirm it
>>> is
>>>> similar to your query:
>>>> SELECT count(*)
>>>> FROM dfs.`/data/tpch01/line/` a
>>>> RIGHT JOIN derby.DRILL_DERBY_TEST.PERSON b
>>>> ON a.cvalue = b.person_id
>>>>
>>>>
>>>>
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>>
>>>>> On Wed, Jan 13, 2016 at 7:36 PM, <[email protected]> wrote:
>>>>>
>>>>> Thanks a bunch Jacques!
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>>> On 14-Jan-2016, at 12:48 AM, Jacques Nadeau <[email protected]>
>>>> wrote:
>>>>>>
>>>>>> I think it is most likely probably trivial but unfortunately haven't
>>>> had
>>>>>> the time to look at it yet. It looks like, for some reason, we're
>>>> having
>>>>> a
>>>>>> failure when serializing the query to pass around between nodes.
>>> Let me
>>>>> try
>>>>>> to take a look today.
>>>>>>
>>>>>> --
>>>>>> Jacques Nadeau
>>>>>> CTO and Co-Founder, Dremio
>>>>>>
>>>>>> On Wed, Jan 13, 2016 at 3:17 AM, Rohit Kulkarni <
>>>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Is this trivial, or big?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rohit
>>>>>>>
>>>>>>> On Thu, Jan 7, 2016 at 11:29 PM, Boris Chmiel <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi everyone,I do also have this error trying to join MSSQL data
>>>> source
>>>>>>>> with dfs parquet files. Here the stack :
>>>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
>>>>>>>> ERROR:IllegalStateException: Already had POJO for id
>>> (java.lang.Integer)[com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>>>>>> ]
>>>>>>>> Fragment2:0 [Error Id: 8431453e-94cb-459a-bc6c-5b5508c7ff84 on
>>>>>>>> PC-PC:31010](com.fasterxml.jackson.databind.JsonMappingException)
>>>>> Already
>>>>>>>> had POJO for
>>> id(java.lang.Integer)[com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>>>>>> ]
>>>>>>>> (throughreference chain:
>>> org.apache.drill.exec.physical.config.BroadcastSender["destinations"])com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():210com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():177com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow():1420com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():351com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269com.fasterxml.jackson.databind.ObjectReader.readValue():896org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94org.apache.drill.exec.work.fragment.FragmentExecutor.run():227org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWorker():1145java.util.concurrent.ThreadPoolExecutor$Worker.run():615java.lang.Thread.run():745
>>>>>>>> Caused By (java.lang.IllegalStateException) Alreadyhad POJO for id
>>> (java.lang.Integer)[com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>>>>>> ]
>>> com.fasterxml.jackson.annotation.SimpleObjectIdResolver.bindItem():20com.fasterxml.jackson.databind.deser.impl.ReadableObjectId.bindItem():66com.fasterxml.jackson.databind.deser.impl.PropertyValueBuffer.handleIdValue():117com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build():169com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():349com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269com.fasterxml.jackson.databind.ObjectReader.readValue():896org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94org.apache.drill.exec.work.fragment.FragmentExecutor.run():227org.apache.drill.common.SelfCleaningRunnable.run():38java.util.concurrent.ThreadPoolExecutor.runWorker():1145java.util.concurrent.ThreadPoolExecutor$Worker.run():615java.lang.Thread.run():745
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le Jeudi 7 janvier 2016 17h08, Rohit Kulkarni <
>>>>>>>> [email protected]> a écrit :
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Jacques,
>>>>>>>>
>>>>>>>> Here is the full stack trace as you asked -
>>>>>>>>
>>>>>>>> *Error: SYSTEM ERROR: IllegalStateException: Already had POJO for
>>> id
>>>>>>>> (java.lang.Integer)
>>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>> ]*
>>>>>>>>
>>>>>>>> *Fragment 2:0*
>>>>>>>>
>>>>>>>> *[Error Id: 57494209-04e8-4580-860d-461cf50b41f8 on
>>>>>>>> ip-x-x-x-x.ec2.internal:31010]*
>>>>>>>>
>>>>>>>> * (com.fasterxml.jackson.databind.JsonMappingException) Already
>>> had
>>>>> POJO
>>>>>>>> for id (java.lang.Integer)
>>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>> ]
>>>>>>>> (through reference chain:
>>>> org.apache.drill.exec.physical.config.BroadcastSender["destinations"])*
>>>>>>>> *
>>>> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():210*
>>>>>>>> *
>>>> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath():177*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow():1420*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():351*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41*
>>>>>>>> *
>>>> com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269*
>>>>>>>> * com.fasterxml.jackson.databind.ObjectReader.readValue():896*
>>>>>>>> *
>>> org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94*
>>>>>>>> *
>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227*
>>>>>>>> * org.apache.drill.common.SelfCleaningRunnable.run():38*
>>>>>>>> * java.util.concurrent.ThreadPoolExecutor.runWorker():1145*
>>>>>>>> * java.util.concurrent.ThreadPoolExecutor$Worker.run():615*
>>>>>>>> * java.lang.Thread.run():745*
>>>>>>>> * Caused By (java.lang.IllegalStateException) Already had POJO
>>> for
>>>> id
>>>>>>>> (java.lang.Integer)
>>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>> ]*
>>>>>>>> *
>>> com.fasterxml.jackson.annotation.SimpleObjectIdResolver.bindItem():20*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.impl.ReadableObjectId.bindItem():66*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.impl.PropertyValueBuffer.handleIdValue():117*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build():169*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased():349*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault():1056*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject():264*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithObjectId():1028*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther():154*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize():126*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId():113*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject():84*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType():132*
>>>>>>>> *
>>> com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize():41*
>>>>>>>> *
>>>> com.fasterxml.jackson.databind.ObjectReader._bindAndClose():1269*
>>>>>>>> * com.fasterxml.jackson.databind.ObjectReader.readValue():896*
>>>>>>>> *
>>> org.apache.drill.exec.planner.PhysicalPlanReader.readFragmentOperator():94*
>>>>>>>> *
>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227*
>>>>>>>> * org.apache.drill.common.SelfCleaningRunnable.run():38*
>>>>>>>> * java.util.concurrent.ThreadPoolExecutor.runWorker():1145*
>>>>>>>> * java.util.concurrent.ThreadPoolExecutor$Worker.run():615*
>>>>>>>> * java.lang.Thread.run():745 (state=,code=0)*
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Wed, Jan 6, 2016 at 9:44 PM, Jacques Nadeau <
>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Can you turn on verbose errors and post the full stack trace of
>>> the
>>>>>>>> error?
>>>>>>>>> You can enable verbose errors per the instructions here:
>>>> https://drill.apache.org/docs/troubleshooting/#enable-verbose-errors
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jacques Nadeau
>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>
>>>>>>>>>> On Wed, Jan 6, 2016 at 6:10 AM, <[email protected]>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Any thoughts on this? I tried so many variants of this query but
>>>> same
>>>>>>>>>> error!
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Rohit
>>>>>>>>>>
>>>>>>>>>>> On 06-Jan-2016, at 12:26 AM, Rohit Kulkarni <
>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks a bunch for replying! I quickly ran this - the TAGS_US
>>> data
>>>>>>> in
>>>>>>>>>> HDFS is in parquet format.
>>>>>>>>>>>
>>>>>>>>>>> select distinct typeof(cvalue)
>>>>>>>>>>>
>>>>>>>>>>> from hdfs.drill.TAGS_US;
>>>>>>>>>>>
>>>>>>>>>>> +----------+
>>>>>>>>>>>
>>>>>>>>>>> | EXPR$0 |
>>>>>>>>>>>
>>>>>>>>>>> +----------+
>>>>>>>>>>>
>>>>>>>>>>> | VARCHAR |
>>>>>>>>>>>
>>>>>>>>>>> +----------+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Same with the table in Redshift. I changed my query to
>>>> specifically
>>>>>>>>> cast
>>>>>>>>>> the columns to VARCHAR again -
>>>>>>>>>>>
>>>>>>>>>>> select count(*)
>>>>>>>>>>>
>>>>>>>>>>> from redshift.reports.public.us_tags as a
>>>>>>>>>>>
>>>>>>>>>>> join hdfs.drill.TAGS_US as b
>>>>>>>>>>>
>>>>>>>>>>> on cast(b.cvalue as varchar) = cast(a.tag_value as varchar) ;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I see the same error again.
>>>>>>>>>>>
>>>>>>>>>>> Here is the explain plan for the query -
>>>>>>>>>>>
>>>>>>>>>>> select count(*)
>>>>>>>>>>> from hdfs.drill.TAGS_US as a
>>>>>>>>>>> join redshift.reports.public.us_tags as b
>>>>>>>>>>> on a.cvalue = b.tag_value;
>>>>>>>>>>>
>>>>>>>>>>> Error: SYSTEM ERROR: IllegalStateException: Already had POJO
>>> for
>>>> id
>>>>>>>>>> (java.lang.Integer)
>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>>> ]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +------+------+
>>>>>>>>>>> | text | json |
>>>>>>>>>>> +------+------+
>>>>>>>>>>> | 00-00 Screen
>>>>>>>>>>> 00-01 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
>>>>>>>>>>> 00-02 UnionExchange
>>>>>>>>>>> 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()])
>>>>>>>>>>> 01-02 Project($f0=[0])
>>>>>>>>>>> 01-03 HashJoin(condition=[=($0, $1)],
>>>>>>> joinType=[inner])
>>>>>>>>>>> 01-05 Scan(groupscan=[ParquetGroupScan
>>>>>>>>>> [entries=[ReadEntryWithPath [path=hdfs://
>>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US]
>>> <http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US%5D>],
>>>>>>>>>> selectionRoot=hdfs://
>>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US,
>>>>>>>> numFiles=1,
>>>>>>>>>> usedMetadataFile=false, columns=[`cvalue`]]])
>>>>>>>>>>> 01-04 BroadcastExchange
>>>>>>>>>>> 02-01 Project(tag_value=[$2])
>>>>>>>>>>> 02-02 Jdbc(sql=[SELECT *
>>>>>>>>>>> FROM "reports"."public"."us_tags"])
>>>>>>>>>>> | {
>>>>>>>>>>> "head" : {
>>>>>>>>>>> "version" : 1,
>>>>>>>>>>> "generator" : {
>>>>>>>>>>> "type" : "ExplainHandler",
>>>>>>>>>>> "info" : ""
>>>>>>>>>>> },
>>>>>>>>>>> "type" : "APACHE_DRILL_PHYSICAL",
>>>>>>>>>>> "options" : [ ],
>>>>>>>>>>> "queue" : 0,
>>>>>>>>>>> "resultMode" : "EXEC"
>>>>>>>>>>> },
>>>>>>>>>>> "graph" : [ {
>>>>>>>>>>> "pop" : "jdbc-scan",
>>>>>>>>>>> "@id" : 0,
>>>>>>>>>>> "sql" : "SELECT *\nFROM \"reports\".\"public\".\"us_tags\"",
>>>>>>>>>>> "config" : {
>>>>>>>>>>> "type" : "jdbc",
>>>>>>>>>>> "driver" : "com.amazon.redshift.jdbc4.Driver",
>>>>>>>>>>> "url" : "",
>>>>>>>>>>> "username" : "",
>>>>>>>>>>> "password" : "",
>>>>>>>>>>> "enabled" : true
>>>>>>>>>>> },
>>>>>>>>>>> "userName" : "",
>>>>>>>>>>> "cost" : 0.0
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "project",
>>>>>>>>>>> "@id" : 131073,
>>>>>>>>>>> "exprs" : [ {
>>>>>>>>>>> "ref" : "`tag_value`",
>>>>>>>>>>> "expr" : "`tag_value`"
>>>>>>>>>>> } ],
>>>>>>>>>>> "child" : 0,
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 100.0
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "broadcast-exchange",
>>>>>>>>>>> "@id" : 65540,
>>>>>>>>>>> "child" : 131073,
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 100.0
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "parquet-scan",
>>>>>>>>>>> "@id" : 65541,
>>>>>>>>>>> "userName" : "XXXX",
>>>>>>>>>>> "entries" : [ {
>>>>>>>>>>> "path" : "hdfs://
>>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US"
>>>>>>>>>>> } ],
>>>>>>>>>>> "storage" : {
>>>>>>>>>>> "type" : "file",
>>>>>>>>>>> "enabled" : true,
>>>>>>>>>>> "connection" : "hdfs://
>>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/",
>>>>>>>>>>> "workspaces" : {
>>>>>>>>>>> "root" : {
>>>>>>>>>>> "location" : "/",
>>>>>>>>>>> "writable" : true,
>>>>>>>>>>> "defaultInputFormat" : null
>>>>>>>>>>> },
>>>>>>>>>>> "tmp" : {
>>>>>>>>>>> "location" : "/tmp",
>>>>>>>>>>> "writable" : true,
>>>>>>>>>>> "defaultInputFormat" : null
>>>>>>>>>>> },
>>>>>>>>>>> "drill" : {
>>>>>>>>>>> "location" : "/drill",
>>>>>>>>>>> "writable" : true,
>>>>>>>>>>> "defaultInputFormat" : "tsv"
>>>>>>>>>>> },
>>>>>>>>>>> "drill2" : {
>>>>>>>>>>> "location" : "/drill",
>>>>>>>>>>> "writable" : true,
>>>>>>>>>>> "defaultInputFormat" : "csv"
>>>>>>>>>>> }
>>>>>>>>>>> },
>>>>>>>>>>> "formats" : {
>>>>>>>>>>> "psv" : {
>>>>>>>>>>> "type" : "text",
>>>>>>>>>>> "extensions" : [ "tbl" ],
>>>>>>>>>>> "delimiter" : "|"
>>>>>>>>>>> },
>>>>>>>>>>> "csv" : {
>>>>>>>>>>> "type" : "text",
>>>>>>>>>>> "extensions" : [ "csv" ],
>>>>>>>>>>> "delimiter" : ","
>>>>>>>>>>> },
>>>>>>>>>>> "tsv" : {
>>>>>>>>>>> "type" : "text",
>>>>>>>>>>> "extensions" : [ "tsv" ],
>>>>>>>>>>> "delimiter" : "\t"
>>>>>>>>>>> },
>>>>>>>>>>> "parquet" : {
>>>>>>>>>>> "type" : "parquet"
>>>>>>>>>>> },
>>>>>>>>>>> "json" : {
>>>>>>>>>>> "type" : "json"
>>>>>>>>>>> },
>>>>>>>>>>> "avro" : {
>>>>>>>>>>> "type" : "avro"
>>>>>>>>>>> },
>>>>>>>>>>> "sequencefile" : {
>>>>>>>>>>> "type" : "sequencefile",
>>>>>>>>>>> "extensions" : [ "seq" ]
>>>>>>>>>>> },
>>>>>>>>>>> "csvh" : {
>>>>>>>>>>> "type" : "text",
>>>>>>>>>>> "extensions" : [ "csvh" ],
>>>>>>>>>>> "extractHeader" : true,
>>>>>>>>>>> "delimiter" : ","
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> },
>>>>>>>>>>> "format" : {
>>>>>>>>>>> "type" : "parquet"
>>>>>>>>>>> },
>>>>>>>>>>> "columns" : [ "`cvalue`" ],
>>>>>>>>>>> "selectionRoot" : "hdfs://
>>>>>>>>>> ec2-XX-XX-XX-XX.compute-1.amazonaws.com:8020/drill/TAGS_US",
>>>>>>>>>>> "fileSet" : [ "/drill/TAGS_US/0_0_1.parquet",
>>>>>>>>>> "/drill/TAGS_US/0_0_0.parquet" ],
>>>>>>>>>>> "cost" : 4.1667342E7
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "hash-join",
>>>>>>>>>>> "@id" : 65539,
>>>>>>>>>>> "left" : 65541,
>>>>>>>>>>> "right" : 65540,
>>>>>>>>>>> "conditions" : [ {
>>>>>>>>>>> "relationship" : "EQUALS",
>>>>>>>>>>> "left" : "`cvalue`",
>>>>>>>>>>> "right" : "`tag_value`"
>>>>>>>>>>> } ],
>>>>>>>>>>> "joinType" : "INNER",
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 4.1667342E7
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "project",
>>>>>>>>>>> "@id" : 65538,
>>>>>>>>>>> "exprs" : [ {
>>>>>>>>>>> "ref" : "`$f0`",
>>>>>>>>>>> "expr" : "0"
>>>>>>>>>>> } ],
>>>>>>>>>>> "child" : 65539,
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 4.1667342E7
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "streaming-aggregate",
>>>>>>>>>>> "@id" : 65537,
>>>>>>>>>>> "child" : 65538,
>>>>>>>>>>> "keys" : [ ],
>>>>>>>>>>> "exprs" : [ {
>>>>>>>>>>> "ref" : "`EXPR$0`",
>>>>>>>>>>> "expr" : "count(1) "
>>>>>>>>>>> } ],
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 1.0
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "union-exchange",
>>>>>>>>>>> "@id" : 2,
>>>>>>>>>>> "child" : 65537,
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 1.0
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "streaming-aggregate",
>>>>>>>>>>> "@id" : 1,
>>>>>>>>>>> "child" : 2,
>>>>>>>>>>> "keys" : [ ],
>>>>>>>>>>> "exprs" : [ {
>>>>>>>>>>> "ref" : "`EXPR$0`",
>>>>>>>>>>> "expr" : "$sum0(`EXPR$0`) "
>>>>>>>>>>> } ],
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 1.0
>>>>>>>>>>> }, {
>>>>>>>>>>> "pop" : "screen",
>>>>>>>>>>> "@id" : 0,
>>>>>>>>>>> "child" : 1,
>>>>>>>>>>> "initialAllocation" : 1000000,
>>>>>>>>>>> "maxAllocation" : 10000000000,
>>>>>>>>>>> "cost" : 1.0
>>>>>>>>>>> } ]
>>>>>>>>>>> } |
>>>>>>>>>>> +------+------+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 4, 2016 at 9:42 PM, Andries Engelbrecht <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> Perhaps check the data type of all the fields being used for
>>> the
>>>>>>>> join.
>>>>>>>>>>>>
>>>>>>>>>>>> Select cvalue, TYPEOF(cvalue) from hdfs...... limit 10
>>>>>>>>>>>>
>>>>>>>>>>>> and similar for tag_value on redshift.
>>>>>>>>>>>>
>>>>>>>>>>>> You can then do a predicate to find records where the data
>>> type
>>>>>>> may
>>>>>>>> be
>>>>>>>>>> different.
>>>>>>>>>>>> where typeof(<field>) not like '<data type of field>'
>>>>>>>>>>>>
>>>>>>>>>>>> I believe there was a nice write up on they topic, but can't
>>> find
>>>>>>> it
>>>>>>>>>> now.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --Andries
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Jan 3, 2016, at 8:45 PM, Rohit Kulkarni <
>>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am sure if not all of you, but some of you must have seen
>>> this
>>>>>>>>>> error some
>>>>>>>>>>>>> time -
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Error: SYSTEM ERROR: IllegalStateException: Already had POJO
>>>>>>> for
>>>>>>>> id
>>>>>>>>>>>>> (java.lang.Integer)
>>>>>>> [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8
>>>>>>>>> ]*
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to do a join between Redshift (JDBC) and HDFS
>>> like
>>>>>>>>> this
>>>>>>>>>> -
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *select count(*)from hdfs.drill.TAGS_US as aright join
>>>>>>>>>>>>> redshift.reports.public.us_tags as bon a.cvalue =
>>> b.tag_value;*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see anything wrong in the query. The two individual
>>>>>>> tables
>>>>>>>>>> return
>>>>>>>>>>>>> proper data when fired a query separately. Is something
>>> missing
>>>>>>> or
>>>>>>>>> am
>>>>>>>>>> I
>>>>>>>>>>>>> doing something wrong?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Would very much appreciate your help! Thanks!!
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Warm Regards,
>>>>>>>>>>>>> Rohit Kulkarni
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Warm Regards,
>>>>>>>>>>> Rohit Kulkarni
>>>>>>>>>>> Mo.: +91 89394 63593
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Warm Regards,
>>>>>>>> Rohit Kulkarni
>>>>>>>> Mo.: +91 89394 63593
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Warm Regards,
>>>>>>> Rohit Kulkarni
>>>>>>> Mo.: +91 89394 63593
>>
>>