Re: DataFrame column structure change

2015-08-13 Thread Eugene Morozov
I have a pretty complex nested structure with several levels. So in order to 
create it I use SQLContext.createDataFrame method and provide specific Rows 
with specific StrucTypes, both of which I build myself.

To build a Row I iterate over my values and literally build a Row.
List row = new LinkedList<>();
for (Attribute attributeNode : attributeNodes()) {
final String name = attributeNode.getName();
if (name.equals(“attr-simple-1")) {
row.add(obj.getValue());
} else if (name.equals("attr-nested-1")) {
List rowAttributes = new LinkedList<>();
for (Attribute node : attributeNode.getAttributes()) {
String nodeName = node.getName();
if (obj.getSimpleAttributeNames().contains(nodeName)) {
rowAttributes.add( value );
} else if ( nested ) {
rowAttributes.add( // recursion );
} else rowAttributes.add(null);
}
row.add(new GenericRow(rowAttributes.toArray(new 
Object[rowAttributes.size()])));
} else {
row.add(null);
}
}
return new GenericRow(row.toArray(new Object[row.size()]));

To build StructType I create an array of StructFields
List structFields = ...
if (attribute.isSingleValue()) {
structFields.add(DataTypes.createStructField(attribute.getName(), 
dataType(attribute), true));
} else {
structFields.add(DataTypes.createStructField(attribute.getName(), 
DataTypes.createArrayType(dataType(attribute)), true));
}

and then
DataTypes.createStructType(structFields);

dataType() is a method to get corresponding o.a.spark.sql.types.DataType;


If you have to create Row with another structure you just can map original Row 
into the one with the new structure and build corresponding StructType. 
Although if you find a simpler way, I’d really like to know about that.

On 07 Aug 2015, at 12:43, Rishabh Bhardwaj  wrote:

> I am doing it by creating a new data frame out of the fields to be nested and 
> then join with the original DF.
> Looking for some optimized solution here.
> 
> On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj  wrote:
> Hi all,
> 
> I want to have some nesting structure from the existing columns of the 
> dataframe.
> For that,,I am trying to transform a DF in the following way,but couldn't do 
> it.
> 
> scala> df.printSchema
> root
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- d: string (nullable = true)
>  |-- e: string (nullable = true)
>  |-- f: string (nullable = true)
> 
> To
> 
> scala> newDF.printSchema
> root
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- newCol: struct (nullable = true)
>  ||-- d: string (nullable = true)
>  ||-- e: string (nullable = true)
> 
> 
> help me.
> 
> Regards,
> Rishabh.
> 

Eugene Morozov
fathers...@list.ru






Re: DataFrame column structure change

2015-08-08 Thread Raghavendra Pandey
You can use struct function of org.apache.spark.sql.function class to
combine two columns to create struct column.
Sth like.
val nestedCol = struct(df("d"), df("e"))
df.select(df(a), df(b), df(c), nestedCol)
On Aug 7, 2015 3:14 PM, "Rishabh Bhardwaj"  wrote:

> I am doing it by creating a new data frame out of the fields to be nested
> and then join with the original DF.
> Looking for some optimized solution here.
>
> On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj 
> wrote:
>
>> Hi all,
>>
>> I want to have some nesting structure from the existing columns of
>> the dataframe.
>> For that,,I am trying to transform a DF in the following way,but couldn't
>> do it.
>>
>> scala> df.printSchema
>> root
>>  |-- a: string (nullable = true)
>>  |-- b: string (nullable = true)
>>  |-- c: string (nullable = true)
>>  |-- d: string (nullable = true)
>>  |-- e: string (nullable = true)
>>  |-- f: string (nullable = true)
>>
>> *To*
>>
>> scala> newDF.printSchema
>> root
>>  |-- a: string (nullable = true)
>>  |-- b: string (nullable = true)
>>  |-- c: string (nullable = true)
>>  |-- newCol: struct (nullable = true)
>>  ||-- d: string (nullable = true)
>>  ||-- e: string (nullable = true)
>>
>>
>> help me.
>>
>> Regards,
>> Rishabh.
>>
>
>


Re: DataFrame column structure change

2015-08-07 Thread Rishabh Bhardwaj
I am doing it by creating a new data frame out of the fields to be nested
and then join with the original DF.
Looking for some optimized solution here.

On Fri, Aug 7, 2015 at 2:06 PM, Rishabh Bhardwaj  wrote:

> Hi all,
>
> I want to have some nesting structure from the existing columns of
> the dataframe.
> For that,,I am trying to transform a DF in the following way,but couldn't
> do it.
>
> scala> df.printSchema
> root
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- d: string (nullable = true)
>  |-- e: string (nullable = true)
>  |-- f: string (nullable = true)
>
> *To*
>
> scala> newDF.printSchema
> root
>  |-- a: string (nullable = true)
>  |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
>  |-- newCol: struct (nullable = true)
>  ||-- d: string (nullable = true)
>  ||-- e: string (nullable = true)
>
>
> help me.
>
> Regards,
> Rishabh.
>


DataFrame column structure change

2015-08-07 Thread Rishabh Bhardwaj
Hi all,

I want to have some nesting structure from the existing columns of
the dataframe.
For that,,I am trying to transform a DF in the following way,but couldn't
do it.

scala> df.printSchema
root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
 |-- c: string (nullable = true)
 |-- d: string (nullable = true)
 |-- e: string (nullable = true)
 |-- f: string (nullable = true)

*To*

scala> newDF.printSchema
root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
 |-- c: string (nullable = true)
 |-- newCol: struct (nullable = true)
 ||-- d: string (nullable = true)
 ||-- e: string (nullable = true)


help me.

Regards,
Rishabh.