Re: A scene with unstable Spark performance
This is a case where resources are fixed in the same SparkContext, but sqls have different priorities. Some SQLs are only allowed to be executed if there are spare resources, once the high priority sql comes in, those sqls taskset either are killed or stalled. If we set a high priority pool's minShare to a relatively higher value, e.g. 50% or 60% of total cores, does it make sense? Sungwoo Park 于2022年5月18日周三 13:28写道: > The problem you describe is the motivation for developing Spark on MR3. > From the blog article ( > https://www.datamonad.com/post/2021-08-18-spark-mr3/): > > *The main motivation for developing Spark on MR3 is to allow multiple > Spark applications to share compute resources such as Yarn containers or > Kubernetes Pods.* > > The problem is due to an architectural limitation of Spark, and I guess > fixing the problem would require a heavy rewrite of Spark core. When we > developed Spark on MR3, we were not aware of any attempt being made > elsewhere (in academia and industry) to address this limitation. > > A potential workaround might be to implement a custom Spark application > that manages the submission of two groups of Spark jobs and controls their > execution (similarly to Spark Thrift Server). Not sure if this approach > would fix your problem, though. > > If you are interested, see the webpage of Spark on MR3: > https://mr3docs.datamonad.com/docs/spark/ > > We have released Spark 3.0.1 on MR3, and Spark 3.2.1 on MR3 is under > development. For Spark 3.0.1 on MR3, no change is made to Spark and MR3 is > used as an add-on. The main application of MR3 is Hive on MR3, but Spark on > MR3 is equally ready for production. > > Thank you, > > --- Sungwoo > >>
Re: Is RDD thread safe?
I need to cache the DataFrame for accelerating query. In such case, the two query may simultaneously run the DAG before cache data actually happen. Sonal Goyal 于2019年11月19日周二 下午9:46写道: > the RDD or the dataframe is distributed and partitioned by Spark so as to > leverage all your workers (CPUs) effectively. So all the Dataframe > operations are actually happening simultaneously on a section of the data. > Why do you want to use threading here? > > Thanks, > Sonal > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > On Tue, Nov 12, 2019 at 7:18 AM Chang Chen wrote: > >> >> Hi all >> >> I meet a case where I need cache a source RDD, and then create different >> DataFrame from it in different threads to accelerate query. >> >> I know that SparkSession is thread safe( >> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure >> whether RDD si thread safe or not >> >> Thanks >> Chang >> >
Is RDD thread safe?
Hi all I meet a case where I need cache a source RDD, and then create different DataFrame from it in different threads to accelerate query. I know that SparkSession is thread safe( https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure whether RDD si thread safe or not Thanks Chang
Re: The Future Of DStream
Things like kafka and user-defined sources are not supported yet, just because Structure Streaming is in alpha stage. Things like sort are not supported because of implementation difficulty, and I don't think DStream can support either What I want to know is the difference between API (or abstraction), for example, It is quite easy to use same codes for processing batch data because of unbounded table abstraction ( which comes from google's Dataflow paper), that's why the internal engine is based on logical plan, spark plan and RDD. In contrast, DStream can't do same thing easily Actually, Dataset supports map,flatMap and reduce, and hence I can do any user-defined work in theory, that's why I ask what kind of low-level control that DStream can do while Structure Stream can not. Thanks Chang On Wed, Jul 27, 2016 at 6:03 PM, Ofir Manor wrote: > For the 2.0 release, look for "Unsupported Operations" here: > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html > Also, there are bigger gaps - like no Kafka support, no way to plug > user-defined sources or sinks etc > > Ofir Manor > > Co-Founder & CTO | Equalum > > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io > > On Wed, Jul 27, 2016 at 11:24 AM, Chang Chen wrote: > >> >> I don't understand what kind of low level control that DStream can do >> while Structure Streaming can not >> >> Thanks >> Chang >> >> On Wednesday, July 27, 2016, Matei Zaharia >> wrote: >> >>> Yup, they will definitely coexist. Structured Streaming is currently >>> alpha and will probably be complete in the next few releases, but Spark >>> Streaming will continue to exist, because it gives the user more low-level >>> control. It's similar to DataFrames vs RDDs (RDDs are the lower-level API >>> for when you want control, while DataFrames do more optimizations >>> automatically by restricting the computation model). >>> >>> Matei >>> >>> On Jul 27, 2016, at 12:03 AM, Ofir Manor wrote: >>> >>> Structured Streaming in 2.0 is declared as alpha - plenty of bits still >>> missing: >>> >>> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html >>> I assume that it will be declared stable / GA in a future 2.x release, >>> and then it will co-exist with DStream for quite a while before someone >>> will suggest to start a deprecation process that will eventually lead to >>> its removal... >>> As a user, I guess we will need to apply judgement about when to switch >>> to Structured Streaming - each of us have a different risk/value tradeoff, >>> based on our specific situation... >>> >>> Ofir Manor >>> >>> Co-Founder & CTO | Equalum >>> >>> Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io >>> >>> On Wed, Jul 27, 2016 at 8:02 AM, Chang Chen >>> wrote: >>> >>>> Hi guys >>>> >>>> Structure Stream is coming with spark 2.0, but I noticed that DStream >>>> is still here >>>> >>>> What's the future of the DStream, will it be deprecated and removed >>>> eventually? Or co-existed with Structure Stream forever? >>>> >>>> Thanks >>>> Chang >>>> >>>> >>> >>> >
Re: The Future Of DStream
I don't understand what kind of low level control that DStream can do while Structure Streaming can not Thanks Chang On Wednesday, July 27, 2016, Matei Zaharia wrote: > Yup, they will definitely coexist. Structured Streaming is currently alpha > and will probably be complete in the next few releases, but Spark Streaming > will continue to exist, because it gives the user more low-level control. > It's similar to DataFrames vs RDDs (RDDs are the lower-level API for when > you want control, while DataFrames do more optimizations automatically by > restricting the computation model). > > Matei > > On Jul 27, 2016, at 12:03 AM, Ofir Manor > wrote: > > Structured Streaming in 2.0 is declared as alpha - plenty of bits still > missing: > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html > I assume that it will be declared stable / GA in a future 2.x release, and > then it will co-exist with DStream for quite a while before someone will > suggest to start a deprecation process that will eventually lead to its > removal... > As a user, I guess we will need to apply judgement about when to switch to > Structured Streaming - each of us have a different risk/value tradeoff, > based on our specific situation... > > Ofir Manor > > Co-Founder & CTO | Equalum > > Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io > > > On Wed, Jul 27, 2016 at 8:02 AM, Chang Chen > wrote: > >> Hi guys >> >> Structure Stream is coming with spark 2.0, but I noticed that DStream is >> still here >> >> What's the future of the DStream, will it be deprecated and removed >> eventually? Or co-existed with Structure Stream forever? >> >> Thanks >> Chang >> >> > >
The Future Of DStream
Hi guys Structure Stream is coming with spark 2.0, but I noticed that DStream is still here What's the future of the DStream, will it be deprecated and removed eventually? Or co-existed with Structure Stream forever? Thanks Chang