s what you were refering to originally?
Thanks
-Nitin
On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin <r...@databricks.com> wrote:
> It's already there isn't it? The in-memory columnar cache format.
>
>
> On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <nitin2go...@gmail.com>
Hi,
Do we have any plan of supporting parquet-like partitioning support in
Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
in-memory cache partition.
-Nitin
ew API? Is this the expected
behaviour or am I missing something here?
--
Regards
Nitin Goyal
default
> spark.memory.fraction should be 0.66, so that it works out with the default
> JVM flags.
>
> On Mon, Jul 27, 2015 at 6:08 PM, Nitin Goyal <nitin2go...@gmail.com>
> wrote:
>
>> I am running a spark application in YARN having 2 executors with Xms/Xmx
&g
Spar SQL's in-memory cache stores statistics per column which in turn is
used to skip batches(default size 1) within partition
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala#L25
Hope this helps
Thanks
-Nitin
On
Index is executed again. It does not seem to be
> reasonable, because the rdd is cached, and zipWithIndex is already executed
> previously.
>
>
>
> Could you explain why if I perform an operation followed by an action on a
> cached RDD, then the last operation in the lineage of the cached RDD is
> shown to be executed in the Spark UI?
>
>
>
>
>
> Best regards, Alexander
>
--
Regards
Nitin Goyal
I think spark sql's in-memory columnar cache already does compression. Check
out classes in following path :-
https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression
Although compression ratio is not as good as Parquet.
Thanks
-Nitin
--
I am running a spark application in YARN having 2 executors with Xms/Xmx as
32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.
I am seeing that the app's physical memory is ever increasing and finally
gets killed by node manager
2015-07-25 15:07:05,354 WARN
Hi Ted,
Thanks a lot for replying. First of all, moving to 1.4.0 RC2 is not easy for
us as migration cost is big since lot has changed in Spark SQL since 1.2.
Regarding SPARK-7233, I had already looked at it few hours back and it
solves the problem for concurrent queries but my problem is just