Thanks for sharing.
On Sat, May 18, 2019, 00:52 Matt Cheah wrote:
> Hi everyone,
>
>
>
> I would like to share the experiences my organization has had with
> deploying Kubernetes and migrating our Spark applications to Kubernetes
> over from YARN. We are publishing a series of blog posts that de
In order to create an application that executes code on Spark we have a
long lived process. It periodically runs jobs programmatically on a Spark
cluster, meaning it does not use spark-submit. The Jobs it executes have
varying requirements for memory so we want to have the Spark Driver run in
the c
Hi everyone,
I would like to share the experiences my organization has had with deploying
Kubernetes and migrating our Spark applications to Kubernetes over from YARN.
We are publishing a series of blog posts that describe what we have learned and
what we have built.
Our introduction pos
A cached DataFrame isn't supposed to change, by definition.
You can re-read each time or consider setting up a streaming source on
the table which provides a result that updates as new data comes in.
On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos wrote:
>
> Hello,
>
> I have a cached dataframe:
>
Hello,
I have a cached dataframe:
spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.cache
I would like to access the "live" data for this data frame without deleting
the cache (using unpersist()). Whatever I do I always get the cached data
on subsequent queries. Even addi
Hi All,
I am getting Out Of Memory due to GC overhead while reading a table from
HIVE from spark like:
spark.sql("SELECT * FROM some.table where date='2019-05-14' LIMIT
> 10").show()
So when I run above command in spark-shell then it starts processing *1780
tasks* where it goes OOM at a specifi
Hello,
I've a question regarding a use case.
I have an ETL using spark and working great.
I use cephFS mounted on all spark node to store data.
However one problem I have is that b2zipping + transfer from source to spark
storage is really long.
I would like to be able to process the file as
Yes that's exactly what happens, but I would think that if data node is
unavailable/unavailability of data for one of the nodes should not cause
indefinite wait.. Are there any properties we can set to avoid getting into
indefinite/non-deterministic outcome of a spark application?
On Thu, May 16,
Hi,
https://stackoverflow.com/questions/56181135/design-can-kafka-producer-written-as-spark-job
Thank you,
Shyam