Is there a reduceByKey functionality in DataFrame API?

2016-08-10 Thread luismattor
Hi everyone,

Consider the following code:

val result = df.groupBy("col1").agg(min("col2"))

I know that rdd.reduceByKey(func) produces the same RDD as
rdd.groupByKey().mapValues(value => value.reduce(func)) However reducerByKey
is more efficient as it avoids shipping each value to the reducer doing the
aggregation (it ships partial aggregations instead).

I wonder whether the DataFrame API optimizes the code doing something
similar to what RDD.reduceByKey does. 

I am using Spark 1.6.2. 

Regards,
Luis



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-reduceByKey-functionality-in-DataFrame-API-tp27508.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



How to set nullable field when create DataFrame using case class

2016-08-04 Thread luismattor
Hi all,

Consider the following case:

import java.sql.Timestamp
case class MyProduct(t: Timestamp, a: Float)
val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF()
rdd.printSchema()

The output is:
root
 |-- t: timestamp (nullable = true)
 |-- a: float (nullable = false)

How can I set the timestamp column to be NOT nullable?

Regards,
Luis



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-nullable-field-when-create-DataFrame-using-case-class-tp27479.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org