subject:"How to add a new column with date duration from 2 date columns in a dataframe"

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-26 Thread Dhaval Patel

Thanks Davies. HiveContext seems neat to use :) On Thu, Aug 20, 2015 at 3:02 PM, Davies Liu wrote: > As Aram said, there two options in Spark 1.4, > > 1) Use the HiveContext, then you got datediff from Hive, > df.selectExpr("datediff(d2, d1)") > 2) Use Python UDF: > ``` > >>> from datetime impor

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Davies Liu

As Aram said, there two options in Spark 1.4, 1) Use the HiveContext, then you got datediff from Hive, df.selectExpr("datediff(d2, d1)") 2) Use Python UDF: ``` >>> from datetime import date >>> df = sqlContext.createDataFrame([(date(2008, 8, 18), date(2008, 9, 26))], >>> ['d1', 'd2']) >>> from py

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Aram Mkrtchyan

Hi, hope this will help you import org.apache.spark.sql.functions._ import sqlContext.implicits._ import java.sql.Timestamp val df = sc.parallelize(Array((date1, date2))).toDF("day1", "day2") val dateDiff = udf[Long, Timestamp, Timestamp]((value1, value2) => Days.d

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Dhaval Patel

More update on this question..I am using spark 1.4.1. I was just reading documentation of spark 1.5 (still in development) and I think there will be a new func *datediff* that will solve the issue. So please let me know if there is any work-around until spark 1.5 is out :). pyspark.sql.functi

Re: How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Dhaval Patel

Apologies, sent too early accidentally. Actual message is below A dataframe has 2 datecolumns (datetime type) and I would like to add another column that would have difference between these two dates. Dataframe snippet is below. new_df.show

How to add a new column with date duration from 2 date columns in a dataframe

2015-08-20 Thread Dhaval Patel

new_df.withColumn('SVCDATE2', (new_df.next_diag_date-new_df.SVCDATE).days).show() +---+--+--+ | PATID| SVCDATE|next_diag_date| +---+--+--+ |12345655545|2012-02-13| 2012-02-13| |12345655545|2012-02-13| 2012-02-13| |12345655545|2012-02-13| 2012

Re: How to add a new column with date duration from 2 date columns in a dataframe

Re: How to add a new column with date duration from 2 date columns in a dataframe

Re: How to add a new column with date duration from 2 date columns in a dataframe

Re: How to add a new column with date duration from 2 date columns in a dataframe

Re: How to add a new column with date duration from 2 date columns in a dataframe

How to add a new column with date duration from 2 date columns in a dataframe

6 matches

Site Navigation

Mail list logo

Footer information