Hi Jacek,
I was wondering if I could use this approach itself.
It is basically a CSV read in as follows:
val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header", "true").load("/data/stg/table2")
val current_date = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(),
'dd/MM/yyyy') ").collect.apply(0).getString(0)
def ChangeDate(word : String) : String = {
return
word.substring(6,10)+"-"+word.substring(3,5)+"-"+word.substring(0,2)
}
//
// Register it as a custom UDF
//
sqlContext.udf.register("ChangeDate", ChangeDate(_:String))
The DF has the following schema
scala> df.printSchema
root
|-- Invoice Number: string (nullable = true)
|-- Payment date: string (nullable = true)
|-- Net: string (nullable = true)
|-- VAT: string (nullable = true)
|-- Total: string (nullable = true)
Now logically I want to filter out all "Payment date" values more than 6
months old,
I.e.
current_date - "Payment date" > 6 months
For example use months_difference (current, "Payment date") > 6
However, I need to convert "Payment date" from format "dd/MM/yyyy" to "
yyyy-MM-dd" first hence the UDF
The question is will this approach work?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 23 March 2016 at 21:26, Jacek Laskowski <[email protected]> wrote:
> Hi,
>
> Why don't you use Datasets? You'd cut the number of getStrings and
> it'd read nicer to your eyes. Also, doing such transformations would
> *likely* be easier.
>
> p.s. Please gist your example to fix it.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Wed, Mar 23, 2016 at 10:20 PM, Mich Talebzadeh
> <[email protected]> wrote:
> >
> > How can I convert the following from String to datetime
> >
> > scala> df.map(x => (x.getString(1), ChangeDate(x.getString(1)))).take(1)
> > res60: Array[(String, String)] = Array((10/02/2014,2014-02-10))
> >
> > Please note that the custom UDF ChangeDate() has revered the string value
> > from "dd/MM/yyyy" to "yyyy-MM-dd"
> >
> > Now I need to convert ChangeDate(x.getString(1)) from String to datetime?
> >
> > scala> df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).toDate)).take(1)
> > <console>:25: error: value toDate is not a member of String
> > df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).toDate)).take(1)
> >
> > Or
> >
> > scala> df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).cast("date"))).take(1)
> > <console>:25: error: value cast is not a member of String
> > df.map(x => (x.getString(1),
> > ChangeDate(x.getString(1)).cast("date"))).take(1)
> >
> >
> > Thanks,
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>