Re: Inferring schema from GenericRowWithSchema

2016-05-17 Thread Andy Grove
Hmm. I see. Yes, I guess that won't work then.

I don't understand what you are proposing about UDFRegistration. I only see
methods that take tuples of various sizes (1 .. 22).

On Tue, May 17, 2016 at 1:00 PM, Michael Armbrust 
wrote:

> I don't think that you will be able to do that.  ScalaReflection is based
> on the TypeTag of the object, and thus the schema of any particular object
> won't be available to it.
>
> Instead I think you want to use the register functions in UDFRegistration
> that take a schema. Does that make sense?
>
> On Tue, May 17, 2016 at 11:48 AM, Andy Grove 
> wrote:
>
>>
>> Hi,
>>
>> I have a requirement to create types dynamically in Spark and then
>> instantiate those types from Spark SQL via a UDF.
>>
>> I tried doing the following:
>>
>> val addressType = StructType(List(
>>   new StructField("state", DataTypes.StringType),
>>   new StructField("zipcode", DataTypes.IntegerType)
>> ))
>>
>> sqlContext.udf.register("Address", (args: Seq[Any]) => new
>> GenericRowWithSchema(args.toArray, addressType))
>>
>> sqlContext.sql("SELECT Address('NY', 12345)").show(10)
>>
>> This seems reasonable to me but this fails with:
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException:
>> Schema for type
>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema is not
>> supported
>> at
>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:755)
>> at
>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:685)
>> at
>> org.apache.spark.sql.UDFRegistration.register(UDFRegistration.scala:130)
>>
>> It looks like it would be simple to update ScalaReflection to be able to
>> infer the schema from a GenericRowWithSchema, but before I file a JIRA and
>> submit a patch I wanted to see if there is already a way of achieving this.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>>
>


Re: Inferring schema from GenericRowWithSchema

2016-05-17 Thread Michael Armbrust
I don't think that you will be able to do that.  ScalaReflection is based
on the TypeTag of the object, and thus the schema of any particular object
won't be available to it.

Instead I think you want to use the register functions in UDFRegistration
that take a schema. Does that make sense?

On Tue, May 17, 2016 at 11:48 AM, Andy Grove 
wrote:

>
> Hi,
>
> I have a requirement to create types dynamically in Spark and then
> instantiate those types from Spark SQL via a UDF.
>
> I tried doing the following:
>
> val addressType = StructType(List(
>   new StructField("state", DataTypes.StringType),
>   new StructField("zipcode", DataTypes.IntegerType)
> ))
>
> sqlContext.udf.register("Address", (args: Seq[Any]) => new
> GenericRowWithSchema(args.toArray, addressType))
>
> sqlContext.sql("SELECT Address('NY', 12345)").show(10)
>
> This seems reasonable to me but this fails with:
>
> Exception in thread "main" java.lang.UnsupportedOperationException: Schema
> for type org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema is
> not supported
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:755)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:685)
> at org.apache.spark.sql.UDFRegistration.register(UDFRegistration.scala:130)
>
> It looks like it would be simple to update ScalaReflection to be able to
> infer the schema from a GenericRowWithSchema, but before I file a JIRA and
> submit a patch I wanted to see if there is already a way of achieving this.
>
> Thanks,
>
> Andy.
>
>
>