Re: DataFrame column structure change
I have a pretty complex nested structure with several levels. So in order to create it I use SQLContext.createDataFrame method and provide specific Rows with specific StrucTypes, both of which I build myself. To build a Row I iterate over my values and literally build a Row. List row = new LinkedList<>(); for (Attribute attributeNode : attributeNodes()) { final String name = attributeNode.getName(); if (name.equals(“attr-simple-1")) { row.add(obj.getValue()); } else if (name.equals("attr-nested-1")) { List rowAttributes = new LinkedList<>(); for (Attribute node : attributeNode.getAttributes()) { String nodeName = node.getName(); if (obj.getSimpleAttributeNames().contains(nodeName)) { rowAttributes.add( value ); } else if ( nested ) { rowAttributes.add( // recursion ); } else rowAttributes.add(null); } row.add(new GenericRow(rowAttributes.toArray(new Object[rowAttributes.size()]))); } else { row.add(null); } } return new GenericRow(row.toArray(new Object[row.size()])); To build StructType I create an array of StructFields List structFields = ... if (attribute.isSingleValue()) { structFields.add(DataTypes.createStructField(attribute.getName(), dataType(attribute), true)); } else { structFields.add(DataTypes.createStructField(attribute.getName(), DataTypes.createArrayType(dataType(attribute)), true)); } and then DataTypes.createStructType(structFields); dataType() is a method to get corresponding o.a.spark.sql.types.DataType; If you have to create Row with another structure you just can map original Row into the one with the new structure and build corresponding StructType. Although if you find a simpler way, I’d really like to know about that. On 07 Aug 2015, at 12:43, Rishabh Bhardwaj wrote: > I am doing it by creating a new data frame out of the fields to be nested and > then join with the original DF. > Looking for some optimized solution here. > > On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj wrote: > Hi all, > > I want to have some nesting structure from the existing columns of the > dataframe. > For that,,I am trying to transform a DF in the following way,but couldn't do > it. > > scala> df.printSchema > root > |-- a: string (nullable = true) > |-- b: string (nullable = true) > |-- c: string (nullable = true) > |-- d: string (nullable = true) > |-- e: string (nullable = true) > |-- f: string (nullable = true) > > To > > scala> newDF.printSchema > root > |-- a: string (nullable = true) > |-- b: string (nullable = true) > |-- c: string (nullable = true) > |-- newCol: struct (nullable = true) > ||-- d: string (nullable = true) > ||-- e: string (nullable = true) > > > help me. > > Regards, > Rishabh. > Eugene Morozov fathers...@list.ru
Re: DataFrame column structure change
You can use struct function of org.apache.spark.sql.function class to combine two columns to create struct column. Sth like. val nestedCol = struct(df("d"), df("e")) df.select(df(a), df(b), df(c), nestedCol) On Aug 7, 2015 3:14 PM, "Rishabh Bhardwaj" wrote: > I am doing it by creating a new data frame out of the fields to be nested > and then join with the original DF. > Looking for some optimized solution here. > > On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj > wrote: > >> Hi all, >> >> I want to have some nesting structure from the existing columns of >> the dataframe. >> For that,,I am trying to transform a DF in the following way,but couldn't >> do it. >> >> scala> df.printSchema >> root >> |-- a: string (nullable = true) >> |-- b: string (nullable = true) >> |-- c: string (nullable = true) >> |-- d: string (nullable = true) >> |-- e: string (nullable = true) >> |-- f: string (nullable = true) >> >> *To* >> >> scala> newDF.printSchema >> root >> |-- a: string (nullable = true) >> |-- b: string (nullable = true) >> |-- c: string (nullable = true) >> |-- newCol: struct (nullable = true) >> ||-- d: string (nullable = true) >> ||-- e: string (nullable = true) >> >> >> help me. >> >> Regards, >> Rishabh. >> > >
Re: DataFrame column structure change
I am doing it by creating a new data frame out of the fields to be nested and then join with the original DF. Looking for some optimized solution here. On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj wrote: > Hi all, > > I want to have some nesting structure from the existing columns of > the dataframe. > For that,,I am trying to transform a DF in the following way,but couldn't > do it. > > scala> df.printSchema > root > |-- a: string (nullable = true) > |-- b: string (nullable = true) > |-- c: string (nullable = true) > |-- d: string (nullable = true) > |-- e: string (nullable = true) > |-- f: string (nullable = true) > > *To* > > scala> newDF.printSchema > root > |-- a: string (nullable = true) > |-- b: string (nullable = true) > |-- c: string (nullable = true) > |-- newCol: struct (nullable = true) > ||-- d: string (nullable = true) > ||-- e: string (nullable = true) > > > help me. > > Regards, > Rishabh. >
DataFrame column structure change
Hi all, I want to have some nesting structure from the existing columns of the dataframe. For that,,I am trying to transform a DF in the following way,but couldn't do it. scala> df.printSchema root |-- a: string (nullable = true) |-- b: string (nullable = true) |-- c: string (nullable = true) |-- d: string (nullable = true) |-- e: string (nullable = true) |-- f: string (nullable = true) *To* scala> newDF.printSchema root |-- a: string (nullable = true) |-- b: string (nullable = true) |-- c: string (nullable = true) |-- newCol: struct (nullable = true) ||-- d: string (nullable = true) ||-- e: string (nullable = true) help me. Regards, Rishabh.