Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-28 Thread Nitin Goyal
s what you were refering to originally? Thanks -Nitin On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin <r...@databricks.com> wrote: > It's already there isn't it? The in-memory columnar cache format. > > > On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <nitin2go...@gmail.com>

Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-24 Thread Nitin Goyal
Hi, Do we have any plan of supporting parquet-like partitioning support in Spark SQL in-memory cache? Something like one RDD[CachedBatch] per in-memory cache partition. -Nitin

Continuous warning while consuming using new kafka-spark010 API

2016-09-19 Thread Nitin Goyal
ew API? Is this the expected behaviour or am I missing something here? -- Regards Nitin Goyal

Re: Ever increasing physical memory for a Spark Application in YARN

2016-05-03 Thread Nitin Goyal
default > spark.memory.fraction should be 0.66, so that it works out with the default > JVM flags. > > On Mon, Jul 27, 2015 at 6:08 PM, Nitin Goyal <nitin2go...@gmail.com> > wrote: > >> I am running a spark application in YARN having 2 executors with Xms/Xmx &g

Re: Secondary Indexing of RDDs?

2015-12-14 Thread Nitin Goyal
Spar SQL's in-memory cache stores statistics per column which in turn is used to skip batches(default size 1) within partition https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala#L25 Hope this helps Thanks -Nitin On

Re: Operations with cached RDD

2015-10-11 Thread Nitin Goyal
Index is executed again. It does not seem to be > reasonable, because the rdd is cached, and zipWithIndex is already executed > previously. > > > > Could you explain why if I perform an operation followed by an action on a > cached RDD, then the last operation in the lineage of the cached RDD is > shown to be executed in the Spark UI? > > > > > > Best regards, Alexander > -- Regards Nitin Goyal

Re: [ compress in-memory column storage used in sparksql cache table ]

2015-09-02 Thread Nitin Goyal
I think spark sql's in-memory columnar cache already does compression. Check out classes in following path :- https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression Although compression ratio is not as good as Parquet. Thanks -Nitin --

Ever increasing physical memory for a Spark Application in YARN

2015-07-27 Thread Nitin Goyal
I am running a spark application in YARN having 2 executors with Xms/Xmx as 32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs. I am seeing that the app's physical memory is ever increasing and finally gets killed by node manager 2015-07-25 15:07:05,354 WARN

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Nitin Goyal
Hi Ted, Thanks a lot for replying. First of all, moving to 1.4.0 RC2 is not easy for us as migration cost is big since lot has changed in Spark SQL since 1.2. Regarding SPARK-7233, I had already looked at it few hours back and it solves the problem for concurrent queries but my problem is just