Hi,
A while back I was looking for functional programming to filter out
transactions older > n months etc.
This turned out to be pretty easy.
I get today's day as follows
var today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(),
'yyyy-MM-dd') ").collect.apply(0).getString(0)
CSV data is stored in an underlying table in Hive (actually created and
populated as an ORC table by Spark)
HiveContext.sql("use accounts")
var n = HiveContext.table("nw_10124772")
scala> n.printSchema
root
|-- transactiondate: date (nullable = true)
|-- transactiontype: string (nullable = true)
|-- description: string (nullable = true)
|-- value: double (nullable = true)
|-- balance: double (nullable = true)
|-- accountname: string (nullable = true)
|-- accountnumber: integer (nullable = true)
//
// Check for historical transactions > 60 months old
//
var old: Int = 60
val rs = n.filter(add_months(col("transactiondate"),old) <
lit(today)).select(lit(today),
col("transactiondate"),add_months(col("transactiondate"),old)).collect.foreach(println)
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-23,2016-03-23]
[2016-03-27,2011-03-23,2016-03-23]
Which seems to work. Any other suggestions will be appreciated.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com