Most likely you are missing import of  org.apache.spark.sql.functions.

In any case, you can write your own function for floor and use it as UDF.

On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:

> Hi,
>
> I load json file that has timestamp (as long in milliseconds) and several
> other attributes. I would like to group them by 5 minutes and store them as
> separate file.
>
> I am facing couple of problems here..
> 1. Using Floor function at select clause (to bucket by 5mins) gives me
> error saying "java.util.NoSuchElementException: key not found: floor". How
> do I use floor function in select clause? I see that floor method is
> available in org.apache.spark.sql.functions clause but not sure why its not
> working here.
> 2. Can I use the same in Group by clause?
> 3. How do I store them as separate file after grouping them?
>
>         String logPath = "my-json.gz";
>         DataFrame logdf = sqlContext.read().json(logPath);
>         logdf.registerTempTable("logs");
>         DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as
> rawTimeStamp, `user.requestId` as requestId,
> *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
>         bucketLogs.toJSON().saveAsTextFile("target_file");
>
> Regards
> Ashok
>



-- 
Best Regards,
Ayan Guha

Reply via email to