Because I can't (and should not) know ahead of time which jobs will be executed, that's the job of the orchestration layer (and can be dynamic). I know I can specify multiple packages. Also not worried about memory.
On Thu, 10 Mar 2022 at 13:54, Artemis User <arte...@dtechspace.com> wrote: > If changing packages or jars isn't your concern, why not just specify ALL > packages that you would need for the Spark environment? You know you can > define multiple packages under the packages option. This shouldn't cause > memory issues since JVM uses dynamic class loading... > > On 3/9/22 10:03 PM, Rafał Wojdyła wrote: > > Hi Artemis, > Thanks for your input, to answer your questions: > > > You may want to ask yourself why it is necessary to change the jar > packages during runtime. > > I have a long running orchestrator process, which executes multiple spark > jobs, currently on a single VM/driver, some of those jobs might > require extra packages/jars (please see example in the issue). > > > Changing package doesn't mean to reload the classes. > > AFAIU this is unrelated > > > There is no way to reload the same class unless you customize the > classloader of Spark. > > AFAIU this is an implementation detail. > > > I also don't think it is necessary to implement a warning or error > message when changing the configuration since it doesn't do any harm > > To reiterate right now the API allows to change configuration of the > context, without that configuration taking effect. See example of confused > users here: > * > https://stackoverflow.com/questions/41886346/spark-2-1-0-session-config-settings-pyspark > * > https://stackoverflow.com/questions/53606756/how-to-set-spark-driver-memory-in-client-mode-pyspark-version-2-3-1 > > I'm curious if you have any opinion about the "hard-reset" workaround, > copy-pasting from the issue: > > ``` > s: SparkSession = ... > > # Hard reset: > s.stop() > s._sc._gateway.shutdown() > s._sc._gateway.proc.stdin.close() > SparkContext._gateway = None > SparkContext._jvm = None > ``` > > Cheers - Rafal > > On 2022/03/09 15:39:58 Artemis User wrote: > > This is indeed a JVM issue, not a Spark issue. You may want to ask > > yourself why it is necessary to change the jar packages during runtime. > > Changing package doesn't mean to reload the classes. There is no way to > > reload the same class unless you customize the classloader of Spark. I > > also don't think it is necessary to implement a warning or error message > > when changing the configuration since it doesn't do any harm. Spark > > uses lazy binding so you can do a lot of such "unharmful" things. > > Developers will have to understand the behaviors of each API before when > > using them.. > > > > > > On 3/9/22 9:31 AM, Rafał Wojdyła wrote: > > > Sean, > > > I understand you might be sceptical about adding this functionality > > > into (py)spark, I'm curious: > > > * would error/warning on update in configuration that is currently > > > effectively impossible (requires restart of JVM) be reasonable? > > > * what do you think about the workaround in the issue? > > > Cheers - Rafal > > > > > > On Wed, 9 Mar 2022 at 14:24, Sean Owen <sr...@gmail.com> wrote: > > > > > > Unfortunately this opens a lot more questions and problems than it > > > solves. What if you take something off the classpath, for example? > > > change a class? > > > > > > On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła > > > <ra...@gmail.com> wrote: > > > > > > Thanks Sean, > > > To be clear, if you prefer to change the label on this issue > > > from bug to sth else, feel free to do so, no strong opinions > > > on my end. What happens to the classpath, whether spark uses > > > some classloader magic, is probably an implementation detail. > > > That said, it's definitely not intuitive that you can change > > > the configuration and get the context (with the updated > > > config) without any warnings/errors. Also what would you > > > recommend as a workaround or solution to this problem? Any > > > comments about the workaround in the issue? Keep in mind that > > > I can't restart the long running orchestration process (python > > > process if that matters). > > > Cheers - Rafal > > > > > > On Wed, 9 Mar 2022 at 13:15, Sean Owen <sr...@gmail.com> > wrote: > > > > > > That isn't a bug - you can't change the classpath once the > > > JVM is executing. > > > > > > On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła > > > <ra...@gmail.com> wrote: > > > > > > Hi, > > > My use case is that, I have a long running process > > > (orchestrator) with multiple tasks, some tasks might > > > require extra spark dependencies. It seems once the > > > spark context is started it's not possible to update > > > `spark.jars.packages`? I have reported an issue at > > > https://issues.apache.org/jira/browse/SPARK-38438, > > > together with a workaround ("hard reset of the > > > cluster"). I wonder if anyone has a solution for this? > > > Cheers - Rafal > > > > > > >> >