Re: how to add new column using regular expression within pyspark dataframe

2017-04-24 Thread Yan Facai
Don't use udf, as `minute` and `unix_timestamp` are native method of spark.sql. scala> df.withColumn("minute", minute(unix_timestamp($"str", "HH'h'mm'm'").cast("timestamp"))).show On Tue, Apr 25, 2017 at 7:55 AM, Zeming Yu wrote: > I tried this, but doesn't seem to

Re: how to add new column using regular expression within pyspark dataframe

2017-04-22 Thread Zeming Yu
Thanks a lot! Just another question, how can I extract the minutes as a number? I can use: .withColumn('duration_m',split(flight.duration,'h').getItem(1) to get strings like '10m' but how do I drop the charater "m" at the end? I can use substr(), but what's the function to get the length of

Re: how to add new column using regular expression within pyspark dataframe

2017-04-20 Thread Pushkar.Gujar
Can be as simple as - from pyspark.sql.functions import split flight.withColumn('hour',split(flight.duration,'h').getItem(0)) Thank you, *Pushkar Gujar* On Thu, Apr 20, 2017 at 4:35 AM, Zeming Yu wrote: > Any examples? > > On 20 Apr. 2017 3:44 pm, "颜发才(Yan Facai)"

Re: how to add new column using regular expression within pyspark dataframe

2017-04-20 Thread Zeming Yu
Any examples? On 20 Apr. 2017 3:44 pm, "颜发才(Yan Facai)" wrote: > How about using `withColumn` and UDF? > > example: > + https://gist.github.com/zoltanctoth/2deccd69e3d1cde1dd78 > > +

Re: how to add new column using regular expression within pyspark dataframe

2017-04-19 Thread Yan Facai
How about using `withColumn` and UDF? example: + https://gist.github.com/zoltanctoth/2deccd69e3d1cde1dd78 + https://ragrawal.wordpress.com/2015/10/02/spark-custom-udf-example/ On Mon, Apr 17, 2017 at 8:25 PM, Zeming Yu

Re: how to add new column using regular expression within pyspark dataframe

2017-04-17 Thread Павел
On Mon, Apr 17, 2017 at 3:25 PM, Zeming Yu wrote: > I've got a dataframe with a column looking like this: > > display(flight.select("duration").show()) > > ++ > |duration| > ++ > | 15h10m| > | 17h0m| > | 21h25m| > | 14h25m| > | 14h30m| > ++ > only

how to add new column using regular expression within pyspark dataframe

2017-04-17 Thread Zeming Yu
I've got a dataframe with a column looking like this: display(flight.select("duration").show()) ++ |duration| ++ | 15h10m| | 17h0m| | 21h25m| | 14h30m| | 24h50m| | 26h10m| | 14h30m| | 23h5m| | 21h30m| | 11h50m| | 16h10m| | 15h15m| | 21h25m| | 14h25m| | 14h40m| |