Re: Time window on Processing Time

2017-08-30 Thread madhu phatak
e > > import org.apache.spark.sql.functions._ > > ds.withColumn("processingTime", current_timestamp()) > .groupBy(window("processingTime", "1 minute")) > .count() > > > On Mon, Aug 28, 2017 at 5:46 AM, madhu phatak <phatak@gmail.com> > wrote:

Time window on Processing Time

2017-08-28 Thread madhu phatak
Hi, As I am playing with structured streaming, I observed that window function always requires a time column in input data.So that means it's event time. Is it possible to old spark streaming style window function based on processing time. I don't see any documentation on the same. -- Regards,

Review of ML PR

2017-08-14 Thread madhu phatak
Hi, I have provided a PR around 2 months back to improve the performance of decision tree by allowing flexible user provided storage class for intermediate data. I have posted few questions about handling backward compatibility but there is no answers from long. Can anybody help me to move this

Re: RandomForest caching

2017-05-12 Thread madhu phatak
Hi, I opened a jira. https://issues.apache.org/jira/browse/SPARK-20723 Can some one have a look? On Fri, Apr 28, 2017 at 1:34 PM, madhu phatak <phatak@gmail.com> wrote: > Hi, > > I am testing RandomForestClassification with 50gb of data which is cached > in memory.

RandomForest caching

2017-04-28 Thread madhu phatak
Hi, I am testing RandomForestClassification with 50gb of data which is cached in memory. I have 64gb of ram, in which 28gb is used for original dataset caching. When I run random forest, it caches around 300GB of intermediate data which un caches the original dataset. This caching is triggered

Re: Contributing Documentation Changes

2015-04-24 Thread madhu phatak
wrote: I think that your own tutorials and such should live on your blog. The goal isn't to pull in a bunch of external docs to the site. On Fri, Apr 24, 2015 at 12:57 AM, madhu phatak phatak@gmail.com wrote: Hi, As I was reading contributing to Spark wiki, it was mentioned that we

Contributing Documentation Changes

2015-04-23 Thread madhu phatak
Hi, As I was reading contributing to Spark wiki, it was mentioned that we can contribute external links to spark tutorials. I have written many http://blog.madhukaraphatak.com/categories/spark/ of them in my blog. It will be great if someone can add it to the spark website. Regards, Madhukara

Help needed to publish SizeEstimator as separate library

2014-11-19 Thread madhu phatak
Hi, As I was going through spark source code, SizeEstimator https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala caught my eye. It's a very useful tool to do the size estimations on JVM which helps in use cases like memory bounded cache. It