Re: unix_timestamp() time zone problem

2016-03-19 Thread Davies Liu
Could you try to cast the timestamp as long?

Internally, timestamp are stored as microseconds in UTC, you will got
seconds in UTC if you cast it to long.

On Thu, Mar 17, 2016 at 1:28 PM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> I am using python spark 1.6 and the --packages
> datastax:spark-cassandra-connector:1.6.0-M1-s_2.10
>
> I need to convert a time stamp string into a unix epoch time stamp. The
> function unix_timestamp() function assume current time zone. How ever my
> string data is UTC and encodes the time zone as zero. I have not been able
> to find a way to get the unix time calculated correctly. simpleDateFormat
> does not have good time zone support. Any suggestions?
>
> I could write a UDF and to adjust for time zones how ever this seems like
>  a hack
>
> I tried using to_utc_timestamp(created, 'gmt’) how ever this creates a
> timestamp. I have not been able to figure out how to convert this to a unix
> time sample I.e a long representing epoch
>
> Any suggestions?
>
> stmnt = "select \
> row_key, created, count, unix_timestamp(created) as
> unixTimeStamp, \
> unix_timestamp(created, '-MM-dd HH:mm:ss.z') as etc \
>  from \
> rawTable \
>  where \
>  (created > '{0}') and (created <= '{1}') \
>  and \
>  (row_key = ‘blue' \
> or row_key = ‘red' \
> )".format('2016-03-12 00:30:00+', '2016-03-12
> 04:30:00+’)
>
>
> Sample out put
>
> root
>  |-- row_key: string (nullable = true)
>  |-- created: timestamp (nullable = true)
>  |-- count: long (nullable = true)
>  |-- unixTimeStamp: long (nullable = true)
>  |-- etc: long (nullable = true)
>
> 2016-03-12 00:30:30.0 should be 1457742630 not 1457771430
>
> +-+-+-+-+--+
> |row_key  |created|count|unixTimeStamp|utc|
> +-+-+-+-+--+
> |red|2016-03-12 00:30:30.0|2|1457771430   |1457771430|
> |red|2016-03-12 00:30:45.0|1|1457771445   |1457771445|
>
>
>
> static Column
> 
>  *unix_timestamp
> *
> (Column
> 
>  s)
> Converts time string in format -MM-dd HH:mm:ss to Unix timestamp (in
> seconds), using the default timezone and the default locale, return null if
> fail.
> static Column
> 
>  *unix_timestamp
> *
> (Column
> 
>  s,
> java.lang.String p)
> Convert time string with given pattern (see [
> http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
> to Unix time stamp (in seconds), return null if fail.
>


unix_timestamp() time zone problem

2016-03-19 Thread Andy Davidson
I am using python spark 1.6 and the --packages
datastax:spark-cassandra-connector:1.6.0-M1-s_2.10

I need to convert a time stamp string into a unix epoch time stamp. The
function unix_timestamp() function assume current time zone. How ever my
string data is UTC and encodes the time zone as zero. I have not been able
to find a way to get the unix time calculated correctly. simpleDateFormat
does not have good time zone support. Any suggestions?

I could write a UDF and to adjust for time zones how ever this seems like  a
hack

I tried using to_utc_timestamp(created, 'gmt¹) how ever this creates a
timestamp. I have not been able to figure out how to convert this to a unix
time sample I.e a long representing epoch

Any suggestions?

stmnt = "select \
row_key, created, count, unix_timestamp(created) as
unixTimeStamp, \
unix_timestamp(created, '-MM-dd HH:mm:ss.z') as etc \
 from \
rawTable \
 where \
 (created > '{0}') and (created <= '{1}') \
 and \
 (row_key = Œblue' \
or row_key = Œred' \
)".format('2016-03-12 00:30:00+', '2016-03-12
04:30:00+¹)


Sample out put

root
 |-- row_key: string (nullable = true)
 |-- created: timestamp (nullable = true)
 |-- count: long (nullable = true)
 |-- unixTimeStamp: long (nullable = true)
 |-- etc: long (nullable = true)

2016-03-12 00:30:30.0 should be 1457742630 not 1457771430

+-+-+-+-+--+
|row_key  |created|count|unixTimeStamp|utc|
+-+-+-+-+--+
|red|2016-03-12 00:30:30.0|2|1457771430   |1457771430|
|red|2016-03-12 00:30:45.0|1|1457771445   |1457771445|


static Column 
 unix_timestamp 
 (Column
  s)Converts time string in format -MM-dd HH:mm:ss to Unix timestamp
(in seconds), using the default timezone and the default locale, return null
if fail.
static Column 
 unix_timestamp 
 (Column
  s, java.lang.String p)Convert time string with given pattern (see
[http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
to Unix time stamp (in seconds), return null if fail.