date:20190517

Re: [PSA] Sharing our Experiences With Kubernetes

2019-05-17 Thread Ramandeep Singh

Thanks for sharing. On Sat, May 18, 2019, 00:52 Matt Cheah wrote: > Hi everyone, > > > > I would like to share the experiences my organization has had with > deploying Kubernetes and migrating our Spark applications to Kubernetes > over from YARN. We are publishing a series of blog posts that de

Fwd: Spark Architecture, Drivers, & Executors

2019-05-17 Thread Pat Ferrel

In order to create an application that executes code on Spark we have a long lived process. It periodically runs jobs programmatically on a Spark cluster, meaning it does not use spark-submit. The Jobs it executes have varying requirements for memory so we want to have the Spark Driver run in the c

[PSA] Sharing our Experiences With Kubernetes

2019-05-17 Thread Matt Cheah

Hi everyone, I would like to share the experiences my organization has had with deploying Kubernetes and migrating our Spark applications to Kubernetes over from YARN. We are publishing a series of blog posts that describe what we have learned and what we have built. Our introduction pos

Re: Access to live data of cached dataFrame

2019-05-17 Thread Sean Owen

A cached DataFrame isn't supposed to change, by definition. You can re-read each time or consider setting up a streaming source on the table which provides a result that updates as new data comes in. On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos wrote: > > Hello, > > I have a cached dataframe: >

Access to live data of cached dataFrame

2019-05-17 Thread Tomas Bartalos

Hello, I have a cached dataframe: spark.read.format("delta").load("/data").groupBy(col("event_hour")).count.cache I would like to access the "live" data for this data frame without deleting the cache (using unpersist()). Whatever I do I always get the cached data on subsequent queries. Even addi

Out Of Memory while reading a table partition from HIVE

2019-05-17 Thread Shivam Sharma

Hi All, I am getting Out Of Memory due to GC overhead while reading a table from HIVE from spark like: spark.sql("SELECT * FROM some.table where date='2019-05-14' LIMIT > 10").show() So when I run above command in spark-shell then it starts processing *1780 tasks* where it goes OOM at a specifi

Spark streaming

2019-05-17 Thread Antoine DUBOIS

Hello, I've a question regarding a use case. I have an ETL using spark and working great. I use cephFS mounted on all spark node to store data. However one problem I have is that b2zipping + transfer from source to spark storage is really long. I would like to be able to process the file as

Re: Spark job gets hung on cloudera cluster

2019-05-17 Thread Rishi Shah

Yes that's exactly what happens, but I would think that if data node is unavailable/unavailability of data for one of the nodes should not cause indefinite wait.. Are there any properties we can set to avoid getting into indefinite/non-deterministic outcome of a spark application? On Thu, May 16,

design question related to kafka.

2019-05-17 Thread Shyam P

Hi, https://stackoverflow.com/questions/56181135/design-can-kafka-producer-written-as-spark-job Thank you, Shyam

Re: [PSA] Sharing our Experiences With Kubernetes

Fwd: Spark Architecture, Drivers, & Executors

[PSA] Sharing our Experiences With Kubernetes

Re: Access to live data of cached dataFrame

Access to live data of cached dataFrame

Out Of Memory while reading a table partition from HIVE

Spark streaming

Re: Spark job gets hung on cloudera cluster

design question related to kafka.

9 matches

Site Navigation

Mail list logo

Footer information