Re: GPU Acceleration for spark-3.0.0

2020-06-17 Thread charles_cai
Bobby Thanks for your answer, it seems that I have misunderstood this paragraph in the website : *"GPU-accelerate your Apache Spark 3.0 data science pipelines—without code changes—and speed up data processing and model training while substantially lowering infrastructure costs."* . So if I am

java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco

2020-06-17 Thread murat migdisoglu
Hello all, we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard is enabled. We are using hadoop 3.2.1 with spark 2.4.5. When I try to save a dataframe in parquet format, I get the following exception: java.lang.ClassNotFoundException:

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava
Structured Stream Vs Spark Steaming (DStream)? Which is recommended for system stability.  Exactly once is NOT first priority.  First priority is STABLE system. I am I need to make a decision soon.  I need help.  Here is the question again.  Should I go backward and use Spark Streaming DStream

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava
Frankly speaking I do not care about EXACTLY ONCE... I am OK with ATLEAST ONCE at long as system does not fail every 5 to 7 days with no recovery option. On Wednesday, June 17, 2020, 02:31:50 PM PDT, Rachana Srivastava wrote: Thanks so much TD.  Thanks for forwarding your datalake

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava
Thanks so much TD.  Thanks for forwarding your datalake project but at this time we have budget constraints we can only use open source project.   I just want the Structured Streaming Application or Spark Streaming DStream Application to run without and issue for a long time..  I do not want

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Breno Arosa
Kafka-connect (https://docs.confluent.io/current/connect/index.html) may be an easier solution for this use case of just dumping kafka topics. On 17/06/2020 18:02, Jungtaek Lim wrote: Just in case if anyone prefers ASF projects then there are other alternative projects in ASF as well,

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Jungtaek Lim
Just in case if anyone prefers ASF projects then there are other alternative projects in ASF as well, alphabetically, Apache Hudi [1] and Apache Iceberg [2]. Both are recently graduated as top level projects. (DISCLAIMER: I'm not involved in both.) BTW it would be nice if we make the metadata

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Tathagata Das
Hello Rachana, Getting exactly-once semantics on files and making it scale to a very large number of files are very hard problems to solve. While Structured Streaming + built-in file sink solves the exactly-once guarantee that DStreams could not, it is definitely limited in other ways (scaling in

How to manage offsets in Spark Structured Streaming?

2020-06-17 Thread Rachana Srivastava
 Background: I have written a simple spark structured steaming app to move data from Kafka to S3. Found that in order to support exactly-once guarantee spark creates _spark_metadata folder, which ends up growing too large, when the streaming app runs for a long time the metadata folder grows

Re: unsubscribe

2020-06-17 Thread Jeff Evans
That is not how you unsubscribe. See here: https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Wed, Jun 17, 2020 at 8:56 AM DIALLO Ibrahima (BPCE-IT - Consultime) wrote: > > > > > *Ibrahima DIALLO* > > *Consultant Big Data – Architecte - Analyste* > > *Consultime * - *Pour

unsubscribe

2020-06-17 Thread DIALLO Ibrahima (BPCE-IT - Consultime)
Ibrahima DIALLO Consultant Big Data - Architecte - Analyste Consultime - Pour BPCE-IT - Groupe BPCE D2I_FDT_DMA_BD2 BPCE Infogérance & Technologies 110 Avenue de France - 75013 PARIS -Tél. : +33185342104 [BPCE ITx200.png]

Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava
I have written a simple spark structured steaming app to move data from Kafka to S3. Found that in order to support exactly-once guarantee spark creates _spark_metadata folder, which ends up growing too large as the streaming app is SUPPOSE TO run FOREVER. But when the streaming app runs for a

Re: unsubscribe

2020-06-17 Thread Jeff Evans
That is not how you unsubscribe. See here: https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Wed, Jun 17, 2020 at 5:39 AM Ferguson, Jon wrote: > > > This message is confidential and subject to terms at: > https://www.jpmorgan.com/emaildisclaimer including on confidential, >

unsubscribe

2020-06-17 Thread Ferguson, Jon
This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the