I gave myself a project to start actually writing Spark programs. I'm using Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful and took forever. I was trying to use dataframes and SQL as much as possible. I see that there are date functions in the dataframe API but trying to use them was frustrating. Even following code samples was a headache because apparently the code is different depending on which version of Spark you are using. I was really hoping for a rich set of date functions like you'd find in T-SQL but I never really found them.
Is there a best practice for dealing with dates and time in Spark? I feel like taking a date/time string and converting it to a date/time object and then manipulating data based on the various components of the timestamp object (hour, day, year etc.) should be a heck of a lot easier than what I'm finding and perhaps I'm just not looking in the right place. You can see my work here: https://github.com/BobLovesData/Apache-Spark-In-24-Hours/blob/master/src/net/massstreet/hour10/BayAreaBikeAnalysis.scala Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net<http://www.massstreet.net/> www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData<http://twitter.com/BobLovesData>