CatatlystTypeConverters.scala has all types of utility methods to convert
from Scala to row and vice a versa.

On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan <rishabh...@gmail.com>
wrote:

> I had the same issue. I resolved it in Java, but I am pretty sure it would
> work with scala too. Its kind of a gross hack. But what I did is say I had
> a table in Mysql with 1000 columns
> what is did is that I threw a jdbc query to extracted the schema of the
> table. I stored that schema and wrote a map function to create StructFields
> using structType and Row.Factory. Then I took that table loaded as a
> dataFrame, event though it had a schema. I converted that data frame into
> an RDD, this is when it lost the schema. Then performed something using
> that RDD and then converted back that RDD with the structfield.
> If your source is structured type then it would be better if you can load
> it directly as a DF that way you can preserve the schema. However, in your
> case you should do something like this
> List<StructFrield> fields = new ArrayList<StructField>
> for(keys in MAP)
>  fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true
> ));
>
> StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);
>
> sqlcontext.createDataFrame(rdd, schemaOfDataFrame);
>
> This is how I would do it to make it in Java, not sure about scala syntax.
> Please tell me if that helped.
>
> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein <fabian.boehnl...@gmail.com>
> wrote:
>
> Hi all,
>
> is there a way to create a Spark SQL Row schema based on Scala data types
> without creating a manual mapping?
>
> That's the only example I can find which doesn't require
> spark.sql.types.DataType already as input, but it requires to define them
> as Strings.
>
> * val struct = (new StructType)*   .add("a", "int")*   .add("b", "long")*   
> .add("c", "string")
>
>
>
> Specifically I have an RDD where each element is a Map of 100s of
> variables with different data types which I want to transform to a DataFrame
> where the keys should end up as the column names:
>
> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> ....)
>
>
> Is there a different possibility than building a mapping from the values'
> .getClass to the Spark SQL DataTypes?
>
>
> Thanks,
> Fabian
>
>
>
>
>

Reply via email to