Thanks Davies. HiveContext seems neat to use :)
On Thu, Aug 20, 2015 at 3:02 PM, Davies Liu wrote:
> As Aram said, there two options in Spark 1.4,
>
> 1) Use the HiveContext, then you got datediff from Hive,
> df.selectExpr("datediff(d2, d1)")
> 2) Use Python UDF:
> ```
> >>> from datetime impor
As Aram said, there two options in Spark 1.4,
1) Use the HiveContext, then you got datediff from Hive,
df.selectExpr("datediff(d2, d1)")
2) Use Python UDF:
```
>>> from datetime import date
>>> df = sqlContext.createDataFrame([(date(2008, 8, 18), date(2008, 9, 26))],
>>> ['d1', 'd2'])
>>> from py
Hi,
hope this will help you
import org.apache.spark.sql.functions._
import sqlContext.implicits._
import java.sql.Timestamp
val df = sc.parallelize(Array((date1, date2))).toDF("day1", "day2")
val dateDiff = udf[Long, Timestamp, Timestamp]((value1, value2) =>
Days.d
More update on this question..I am using spark 1.4.1.
I was just reading documentation of spark 1.5 (still in development) and I
think there will be a new func *datediff* that will solve the issue. So
please let me know if there is any work-around until spark 1.5 is out :).
pyspark.sql.functi
Apologies, sent too early accidentally. Actual message is below
A dataframe has 2 datecolumns (datetime type) and I would like to add
another column that would have difference between these two dates.
Dataframe snippet is below.
new_df.show
new_df.withColumn('SVCDATE2',
(new_df.next_diag_date-new_df.SVCDATE).days).show()
+---+--+--+ | PATID| SVCDATE|next_diag_date|
+---+--+--+ |12345655545|2012-02-13|
2012-02-13| |12345655545|2012-02-13| 2012-02-13| |12345655545|2012-02-13|
2012