Re: [SQL] Self join with ArrayType columns problems

Michael Armbrust Mon, 26 Jan 2015 08:07:53 -0800

It seems likely that there is some sort of bug related to the reuse of
array objects that are returned by UDFs.  Can you open a JIRA?


I'll also note that the sql method on HiveContext does run HiveQL
(configured by spark.sql.dialect) and the hql method has been deprecated
since 1.1 (and will probably be removed in 1.3).  The errors are probably
because array and collect set are hive UDFs and thus not available in a
SQLContext.

On Mon, Jan 26, 2015 at 5:44 AM, Dean Wampler <deanwamp...@gmail.com> wrote:

> You are creating a HiveContext, then using the sql method instead of hql.
> Is that deliberate?
>
> The code doesn't work if you replace HiveContext with SQLContext. Lots of
> exceptions are thrown, but I don't have time to investigate now.
>
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Mon, Jan 26, 2015 at 7:17 AM, Pierre B <
> pierre.borckm...@realimpactanalytics.com> wrote:
>
>> Using Spark 1.2.0, we are facing some weird behaviour when performing self
>> join on a table with some ArrayType field.
>> (potential bug ?)
>>
>> I have set up a minimal non working example here:
>> https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
>> <https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
>> >
>> In a nutshell, if the ArrayType column used for the pivot is created
>> manually in the StructType definition, everything works as expected.
>> However, if the ArrayType pivot column is obtained by a sql query (be it
>> by
>> using a "array" wrapper, or using a collect_list operator for instance),
>> then results are completely off.
>>
>> Could anyone have a look as this really is a blocking issue.
>>
>> Thanks!
>>
>> Cheers
>>
>> P.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Self-join-with-ArrayType-columns-problems-tp21364.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: [SQL] Self join with ArrayType columns problems

Reply via email to