[SparkStreaming] How To Stop The SparkStreamingContenxt Gracefully Without Extra Time Cost?

2020-10-11 Thread Lyx
hi, I've being using StreamingContext.stop(true,true) ,trying to stop my application gracefully,which means it can promise all received data will be processed before the whole application terminated. It dose works ,but I also noticed that it will also lead to extra time just waiting for empty

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Thanks Ayan. I am not qualified to answer your first point. However, my experience with Spark with Scala or Spark with Python agrees with your assertion that use cases do not come into it. Most DEV/OPS work dealing with ETL are provided by service companies that have workforce very familiar with

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread ayan guha
But when you have fairly large volume of data that is where spark comes in the party. And I assume the requirement of using spark is already established in the original qs and the discussion is to use python vs scala/java. On Sun, 11 Oct 2020 at 10:51 pm, Sasha Kacanski wrote: > If org has

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Hi, With regard to your statement below ".technology choices are agnostic to use cases according to you" If I may say, I do not think that was the message implied. What was said was that in addition to "best technology fit" there are other factors "equally important" that need to be

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Gourav Sengupta
So Mich and rest, technology choices are agnostic to use cases according to you? This is interesting, really interesting. Perhaps I stand corrected. Regards, Gourav On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh wrote: > if we take Spark and its massive parallel processing and in-memory >

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
if we take Spark and its massive parallel processing and in-memory cache away, then one can argue anything can do the "ETL" job. just write some Java/Scala/SQL/Perl/python to read data and write to from one DB to another often using JDBC connections. However, we all concur that may not be good