Re: Content based window operation on Time-series data

2015-12-17 Thread Sandy Ryza
Hi Arun, A Java API was actually recently added to the library. It will be available in the next release. -Sandy On Thu, Dec 10, 2015 at 12:16 AM, Arun Verma wrote: > Thank you for your reply. It is a Scala and Python library. Is similar > library exists for Java? >

Re: Content based window operation on Time-series data

2015-12-17 Thread Davies Liu
Could you try this? df.groupBy(cast(col("timeStamp") - start) / bucketLengthSec, IntegerType)).agg(max("timestamp"), max("value")).collect() On Wed, Dec 9, 2015 at 8:54 AM, Arun Verma wrote: > Hi all, > > We have RDD(main) of sorted time-series data. We want to split it

Content based window operation on Time-series data

2015-12-09 Thread Arun Verma
Hi all, *We have RDD(main) of sorted time-series data. We want to split it into different RDDs according to window size and then perform some aggregation operation like max, min etc. over each RDD in parallel.* If window size is w then ith RDD has data from (startTime + (i-1)*w) to (startTime +

Re: Content based window operation on Time-series data

2015-12-09 Thread Sean Owen
CC Sandy as his https://github.com/cloudera/spark-timeseries might be of use here. On Wed, Dec 9, 2015 at 4:54 PM, Arun Verma wrote: > Hi all, > > We have RDD(main) of sorted time-series data. We want to split it into > different RDDs according to window size and then

Re: Content based window operation on Time-series data

2015-12-09 Thread Arun Verma
Thank you for your reply. It is a Scala and Python library. Is similar library exists for Java? On Wed, Dec 9, 2015 at 10:26 PM, Sean Owen wrote: > CC Sandy as his https://github.com/cloudera/spark-timeseries might be > of use here. > > On Wed, Dec 9, 2015 at 4:54 PM, Arun