HI ,
If you need a data frame specific solution , you can try the below
df.select(from_unixtime(col("max(utcTimestamp)")/1000))
On Tue, 2 Feb 2016 at 09:44 Ted Yu <[email protected]> wrote:
> See related thread on using Joda DateTime:
> http://search-hadoop.com/m/q3RTtSfi342nveex1&subj=RE+NPE+
> when+using+Joda+DateTime
>
> On Mon, Feb 1, 2016 at 7:44 PM, Kevin Mellott <[email protected]>
> wrote:
>
>> I've had pretty good success using Joda-Time
>> <http://www.joda.org/joda-time/index.html> for date/time manipulations
>> within Spark applications. You may be able to use the *DateTIme* constructor
>> below, if you are starting with milliseconds.
>>
>> DateTime
>>
>> public DateTime(long instant)
>>
>> Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z
>> using ISOChronology in the default time zone.
>> Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z
>>
>> On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <
>> [email protected]> wrote:
>>
>>> What little I know about working with timestamps is based on
>>> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
>>>
>>> Using the example of dates formatted into human friend strings ->
>>> timeStamps I was able to figure out how to convert Epoch times to
>>> timestamps. The same trick did not work for millisecond times.
>>>
>>> Any suggestions would be greatly appreciated.
>>>
>>>
>>> Andy
>>>
>>> Working with epoch times
>>> <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times>
>>>
>>> ref: http://www.epochconverter.com/
>>>
>>> Epoch timestamp: 1456050620
>>>
>>> Timestamp in milliseconds: 1456050620000
>>>
>>> Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT
>>>
>>> Human time (your time zone): 2/21/2016, 2:30:20 AM
>>>
>>>
>>> # Epoch time stamp example
>>>
>>> data = [
>>>
>>> ("1456050620", "1456050621", 1),
>>>
>>> ("1456050622", "14560506203", 2),
>>>
>>> ("14560506204", "14560506205", 3)]
>>>
>>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>>
>>>
>>>
>>> # convert epoch time strings in to spark timestamps
>>>
>>> df = df.select(
>>>
>>> df.start_time.cast("long").alias("start_time"),
>>>
>>> df.end_time.cast("long").alias("end_time"),
>>>
>>> df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>>
>>>
>>> # convert longs to timestamps
>>>
>>> df = df.select(
>>>
>>> df.start_time.cast("timestamp").alias("start_time"),
>>>
>>> df.end_time.cast("timestamp").alias("end_time"),
>>>
>>> df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>>
>>>
>>> root
>>> |-- start_time: long (nullable = true)
>>> |-- end_time: long (nullable = true)
>>> |-- id: long (nullable = true)
>>>
>>> +-----------+-----------+---+
>>> |start_time |end_time |id |
>>> +-----------+-----------+---+
>>> |1456050620 |1456050621 |1 |
>>> |1456050622 |14560506203|2 |
>>> |14560506204|14560506205|3 |
>>> +-----------+-----------+---+
>>>
>>> root
>>> |-- start_time: timestamp (nullable = true)
>>> |-- end_time: timestamp (nullable = true)
>>> |-- id: long (nullable = true)
>>>
>>> +---------------------+---------------------+---+
>>> |start_time |end_time |id |
>>> +---------------------+---------------------+---+
>>> |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1 |
>>> |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2 |
>>> |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3 |
>>> +---------------------+---------------------+---+
>>>
>>>
>>> In [21]:
>>>
>>> # working with millisecond times
>>>
>>> data = [
>>>
>>> ("1456050620000", "1456050620000", 1)]
>>>
>>> df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"])
>>>
>>>
>>>
>>> # convert epoch time strings in to spark timestamps
>>>
>>> df = df.select(
>>>
>>> df.start_time.cast("long").alias("start_time"),
>>>
>>> df.end_time.cast("long").alias("end_time"),
>>>
>>> df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>>
>>>
>>> # convert longs to timestamps
>>>
>>> df = df.select(
>>>
>>> df.start_time.cast("timestamp").alias("start_time"),
>>>
>>> df.end_time.cast("timestamp").alias("end_time"),
>>>
>>> df.id)
>>>
>>> df.printSchema()
>>>
>>> df.show(truncate=False)
>>>
>>> root
>>> |-- start_time: long (nullable = true)
>>> |-- end_time: long (nullable = true)
>>> |-- id: long (nullable = true)
>>>
>>> +-------------+-------------+---+
>>> |start_time |end_time |id |
>>> +-------------+-------------+---+
>>> |1456050620000|1456050620000|1 |
>>> +-------------+-------------+---+
>>>
>>> root
>>> |-- start_time: timestamp (nullable = true)
>>> |-- end_time: timestamp (nullable = true)
>>> |-- id: long (nullable = true)
>>>
>>> +----------------------+----------------------+---+
>>> |start_time |end_time |id |
>>> +----------------------+----------------------+---+
>>> |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1 |
>>> +----------------------+----------------------+---+
>>>
>>>
>>>
>>
>