Most likely you are missing import of org.apache.spark.sql.functions. In any case, you can write your own function for floor and use it as UDF.
On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi, > > I load json file that has timestamp (as long in milliseconds) and several > other attributes. I would like to group them by 5 minutes and store them as > separate file. > > I am facing couple of problems here.. > 1. Using Floor function at select clause (to bucket by 5mins) gives me > error saying "java.util.NoSuchElementException: key not found: floor". How > do I use floor function in select clause? I see that floor method is > available in org.apache.spark.sql.functions clause but not sure why its not > working here. > 2. Can I use the same in Group by clause? > 3. How do I store them as separate file after grouping them? > > String logPath = "my-json.gz"; > DataFrame logdf = sqlContext.read().json(logPath); > logdf.registerTempTable("logs"); > DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as > rawTimeStamp, `user.requestId` as requestId, > *floor(`user.timestamp`/72000*) as timeBucket FROM logs"); > bucketLogs.toJSON().saveAsTextFile("target_file"); > > Regards > Ashok > -- Best Regards, Ayan Guha