Re: Scala types to StructType

Fabian Böhnlein Mon, 15 Feb 2016 00:10:19 -0800

Interesting, thanks.

The (only) publicly accessible method seems /convertToCatalyst/:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L425

Seems it's missing some types like Integer, Short, Long... I'll give ita try.


Thanks,
Fabian

On 12/02/16 05:53, Yogesh Mahajan wrote:

Right, Thanks Ted.

On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:


    Minor correction: the class is CatalystTypeConverters.scala

    On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan
    <ymaha...@snappydata.io <mailto:ymaha...@snappydata.io>> wrote:

        CatatlystTypeConverters.scala has all types of utility methods
        to convert from Scala to row and vice a versa.


        On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan
        <rishabh...@gmail.com <mailto:rishabh...@gmail.com>> wrote:

            I had the same issue. I resolved it in Java, but I am
            pretty sure it would work with scala too. Its kind of a
            gross hack. But what I did is say I had a table in Mysql
            with 1000 columns
            what is did is that I threw a jdbc query to extracted the
            schema of the table. I stored that schema and wrote a map
            function to create StructFields using structType and
            Row.Factory. Then I took that table loaded as a dataFrame,
            event though it had a schema. I converted that data frame
            into an RDD, this is when it lost the schema. Then
            performed something using that RDD and then converted back
            that RDD with the structfield.
            If your source is structured type then it would be better
            if you can load it directly as a DF that way you can
            preserve the schema. However, in your case you should do
            something like this
            List<StructFrield> fields = new ArrayList<StructField>
            for(keys in MAP)
             fields.add(DataTypes.createStructField(keys,
            DataTypes.StringType, true));

            StrructType schemaOfDataFrame =
            DataTypes.createStructType(conffields);

            sqlcontext.createDataFrame(rdd, schemaOfDataFrame);

            This is how I would do it to make it in Java, not sure
            about scala syntax. Please tell me if that helped.

            On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein
            <fabian.boehnl...@gmail.com
            <mailto:fabian.boehnl...@gmail.com>> wrote:

            Hi all,

            is there a way to create a Spark SQL Row schema based on
            Scala data types without creating a manual mapping?

            That's the only example I can find which doesn't require
            spark.sql.types.DataType already as input, but it
            requires to define them as Strings.

            * val struct = (new StructType) * .add("a", "int") *
            .add("b", "long") * .add("c", "string")


            Specifically I have an RDD where each element is a Map of
            100s of variables with different data types which I want
            to transform to a DataFrame
            where the keys should end up as the column names:
            Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> ....)

            Is there a different possibility than building a mapping
            from the values' .getClass to the Spark SQL DataTypes?


            Thanks,
            Fabian

Re: Scala types to StructType

Reply via email to