Hi,

I load json file that has timestamp (as long in milliseconds) and several
other attributes. I would like to group them by 5 minutes and store them as
separate file.

I am facing couple of problems here..
1. Using Floor function at select clause (to bucket by 5mins) gives me
error saying "java.util.NoSuchElementException: key not found: floor". How
do I use floor function in select clause? I see that floor method is
available in org.apache.spark.sql.functions clause but not sure why its not
working here.
2. Can I use the same in Group by clause?
3. How do I store them as separate file after grouping them?

        String logPath = "my-json.gz";
        DataFrame logdf = sqlContext.read().json(logPath);
        logdf.registerTempTable("logs");
        DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as
rawTimeStamp, `user.requestId` as requestId, *floor(`user.timestamp`/72000*)
as timeBucket FROM logs");
        bucketLogs.toJSON().saveAsTextFile("target_file");

Regards
Ashok

Reply via email to