Re: [spark-sql] What is the right way to represent an “Any” type in Spark SQL?

2015-03-29 Thread Eran Medan
Thanks Michael!
Can you please point me to the docs / source location for that automatic
casting? I'm just using it to extract the data and put it in a Map[String,
Any] (long story on the reason...) so I think the casting rules won't
"know" what to cast it to... right? I guess I can have the JSON / parquet
data store it as a string and also have metadata on the "Real" type, but
then it feels a little wrong. Is that the only way to handle it? or perhaps
there is a way to support an "Any" after all? is it just not implemented or
is it a Hive limitation? (I never used Hive other than here, so sorry for
the silly question)

p.s. I fixed the PR based on the code review, but the tests failed due to
GitHub's ongoing DDOS attack, is there a way to restart the tests? :) (or
should I just do a new commit with a white space char to trigger it?)

Thanks again, you guys are great!

On Sat, Mar 28, 2015 at 11:29 PM, Michael Armbrust 
wrote:

> In this case I'd probably just store it as a String.  Our casting rules
> (which come from Hive) are such that when you use a string as an number of
> boolean it will be casted to the desired type.
>
> Thanks for the PR btw :)
>
> On Fri, Mar 27, 2015 at 2:31 PM, Eran Medan 
> wrote:
>
>> Hi everyone,
>>
>> I had a lot of questions today, sorry if I'm spamming the list, but I
>> thought it's better than posting all questions in one thread. Let me know
>> if I should throttle my posts ;)
>>
>> Here is my question:
>>
>> When I try to have a case class that has Any in it (e.g. I have a
>> property map and values can be either String, Int or Boolean, and since we
>> don't have union types, Any is the closest thing)
>>
>> When I try to register such an RDD as a table in 1.2.1 (or convert to
>> DataFrame in 1.3 and then register as a table)
>>
>> I get this weird exception:
>>
>> Exception in thread "main" scala.MatchError: Any (of class
>> scala.reflect.internal.Types$ClassNoArgsTypeRef) at
>> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:112)
>>
>> Which from my interpretaion simply means that Any is not a valid type
>> that Spark SQL can support in it's schema
>>
>> I already sent a pull request  to
>> solve the cryptic exception but my question is - *is there a way to
>> support an "Any" type in Spark SQL?*
>>
>> disclaimer - also posted at
>> http://stackoverflow.com/questions/29310405/what-is-the-right-way-to-represent-an-any-type-in-spark-sql
>>
>
>


Re: [spark-sql] What is the right way to represent an “Any” type in Spark SQL?

2015-03-28 Thread Michael Armbrust
In this case I'd probably just store it as a String.  Our casting rules
(which come from Hive) are such that when you use a string as an number of
boolean it will be casted to the desired type.

Thanks for the PR btw :)

On Fri, Mar 27, 2015 at 2:31 PM, Eran Medan  wrote:

> Hi everyone,
>
> I had a lot of questions today, sorry if I'm spamming the list, but I
> thought it's better than posting all questions in one thread. Let me know
> if I should throttle my posts ;)
>
> Here is my question:
>
> When I try to have a case class that has Any in it (e.g. I have a
> property map and values can be either String, Int or Boolean, and since we
> don't have union types, Any is the closest thing)
>
> When I try to register such an RDD as a table in 1.2.1 (or convert to
> DataFrame in 1.3 and then register as a table)
>
> I get this weird exception:
>
> Exception in thread "main" scala.MatchError: Any (of class
> scala.reflect.internal.Types$ClassNoArgsTypeRef) at
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:112)
>
> Which from my interpretaion simply means that Any is not a valid type
> that Spark SQL can support in it's schema
>
> I already sent a pull request  to
> solve the cryptic exception but my question is - *is there a way to
> support an "Any" type in Spark SQL?*
>
> disclaimer - also posted at
> http://stackoverflow.com/questions/29310405/what-is-the-right-way-to-represent-an-any-type-in-spark-sql
>


[spark-sql] What is the right way to represent an “Any” type in Spark SQL?

2015-03-27 Thread Eran Medan
Hi everyone,

I had a lot of questions today, sorry if I'm spamming the list, but I
thought it's better than posting all questions in one thread. Let me know
if I should throttle my posts ;)

Here is my question:

When I try to have a case class that has Any in it (e.g. I have a property
map and values can be either String, Int or Boolean, and since we don't
have union types, Any is the closest thing)

When I try to register such an RDD as a table in 1.2.1 (or convert to
DataFrame in 1.3 and then register as a table)

I get this weird exception:

Exception in thread "main" scala.MatchError: Any (of class
scala.reflect.internal.Types$ClassNoArgsTypeRef) at
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:112)

Which from my interpretaion simply means that Any is not a valid type that
Spark SQL can support in it's schema

I already sent a pull request  to
solve the cryptic exception but my question is - *is there a way to support
an "Any" type in Spark SQL?*

disclaimer - also posted at
http://stackoverflow.com/questions/29310405/what-is-the-right-way-to-represent-an-any-type-in-spark-sql