Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
Hi Kostas, Thank you for the references of the 2 tickets. It helps me to understand why I got some weird experiences lately. Best Regards, Jerry On Wed, Dec 23, 2015 at 2:32 AM, kostas papageorgopoyloswrote: > Hi > > Fyi > The following 2 tickets are blocking currently (for releases up to 1.5.2) > the pattern of Starting and Stopping a sparkContext inside the same driver > program > > https://issues.apache.org/jira/browse/SPARK-11700 ->memory leak in > SqlContext > https://issues.apache.org/jira/browse/SPARK-11739 > > In an application we have built we initially wanted to use the same > pattern (start-stop-start.etc) > in order to have a better usage of the spark cluster resources. > > I believe that the fixes in the above tickets will allow to safely stop and > restart the sparkContext in the driver program in release 1.6.0 > > Kind Regards > > > > 2015-12-22 21:00 GMT+02:00 Sean Owen : > >> I think the original idea is that the life of the driver is the life >> of the SparkContext: the context is stopped when the driver finishes. >> Or: if for some reason the "context" dies or there's an unrecoverable >> error, that's it for the driver. >> >> (There's nothing wrong with stop(), right? you have to call that when >> the driver ends to shut down Spark cleanly. It's the re-starting >> another context that's at issue.) >> >> This makes most sense in the context of a resource manager, which can >> conceivably restart a driver if you like, but can't reach into your >> program. >> >> That's probably still the best way to think of it. Still it would be >> nice if SparkContext were friendlier to a restart just as a matter of >> design. AFAIK it is; not sure about SQLContext though. If it's not a >> priority it's just because this isn't a usual usage pattern, which >> doesn't mean it's crazy, just not the primary pattern. >> >> On Tue, Dec 22, 2015 at 5:57 PM, Jerry Lam wrote: >> > Hi Sean, >> > >> > What if the spark context stops for involuntary reasons (misbehavior of >> some connections) then we need to programmatically handle the failures by >> recreating spark context. Is there something I don't understand/know about >> the assumptions on how to use spark context? I tend to think of it as a >> resource manager/scheduler for spark jobs. Are you guys planning to >> deprecate the stop method from spark? >> > >> > Best Regards, >> > >> > Jerry >> > >> > Sent from my iPhone >> > >> >> On 22 Dec, 2015, at 3:57 am, Sean Owen wrote: >> >> >> >> Although in many cases it does work to stop and then start a second >> >> context, it wasn't how Spark was originally designed, and I still see >> >> gotchas. I'd avoid it. I don't think you should have to release some >> >> resources; just keep the same context alive. >> >> >> >>> On Tue, Dec 22, 2015 at 5:13 AM, Jerry Lam >> wrote: >> >>> Hi Zhan, >> >>> >> >>> I'm illustrating the issue via a simple example. However it is not >> difficult >> >>> to imagine use cases that need this behaviour. For example, you want >> to >> >>> release all resources of spark when it does not use for longer than >> an hour >> >>> in a job server like web services. Unless you can prevent people from >> >>> stopping spark context, then it is reasonable to assume that people >> can stop >> >>> it and start it again in later time. >> >>> >> >>> Best Regards, >> >>> >> >>> Jerry >> >>> >> >>> >> On Mon, Dec 21, 2015 at 7:20 PM, Zhan Zhang >> wrote: >> >> This looks to me is a very unusual use case. You stop the >> SparkContext, >> and start another one. I don’t think it is well supported. As the >> SparkContext is stopped, all the resources are supposed to be >> released. >> >> Is there any mandatory reason you have stop and restart another >> SparkContext. >> >> Thanks. >> >> Zhan Zhang >> >> Note that when sc is stopped, all resources are released (for >> example in >> yarn >> > On Dec 20, 2015, at 2:59 PM, Jerry Lam >> wrote: >> > >> > Hi Spark developers, >> > >> > I found that SQLContext.getOrCreate(sc: SparkContext) does not >> behave >> > correctly when a different spark context is provided. >> > >> > ``` >> > val sc = new SparkContext >> > val sqlContext =SQLContext.getOrCreate(sc) >> > sc.stop >> > ... >> > >> > val sc2 = new SparkContext >> > val sqlContext2 = SQLContext.getOrCreate(sc2) >> > sc2.stop >> > ``` >> > >> > The sqlContext2 will reference sc instead of sc2 and therefore, the >> > program will not work because sc has been stopped. >> > >> > Best Regards, >> > >> > Jerry >> >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail:
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
I think the original idea is that the life of the driver is the life of the SparkContext: the context is stopped when the driver finishes. Or: if for some reason the "context" dies or there's an unrecoverable error, that's it for the driver. (There's nothing wrong with stop(), right? you have to call that when the driver ends to shut down Spark cleanly. It's the re-starting another context that's at issue.) This makes most sense in the context of a resource manager, which can conceivably restart a driver if you like, but can't reach into your program. That's probably still the best way to think of it. Still it would be nice if SparkContext were friendlier to a restart just as a matter of design. AFAIK it is; not sure about SQLContext though. If it's not a priority it's just because this isn't a usual usage pattern, which doesn't mean it's crazy, just not the primary pattern. On Tue, Dec 22, 2015 at 5:57 PM, Jerry Lamwrote: > Hi Sean, > > What if the spark context stops for involuntary reasons (misbehavior of some > connections) then we need to programmatically handle the failures by > recreating spark context. Is there something I don't understand/know about > the assumptions on how to use spark context? I tend to think of it as a > resource manager/scheduler for spark jobs. Are you guys planning to deprecate > the stop method from spark? > > Best Regards, > > Jerry > > Sent from my iPhone > >> On 22 Dec, 2015, at 3:57 am, Sean Owen wrote: >> >> Although in many cases it does work to stop and then start a second >> context, it wasn't how Spark was originally designed, and I still see >> gotchas. I'd avoid it. I don't think you should have to release some >> resources; just keep the same context alive. >> >>> On Tue, Dec 22, 2015 at 5:13 AM, Jerry Lam wrote: >>> Hi Zhan, >>> >>> I'm illustrating the issue via a simple example. However it is not difficult >>> to imagine use cases that need this behaviour. For example, you want to >>> release all resources of spark when it does not use for longer than an hour >>> in a job server like web services. Unless you can prevent people from >>> stopping spark context, then it is reasonable to assume that people can stop >>> it and start it again in later time. >>> >>> Best Regards, >>> >>> Jerry >>> >>> On Mon, Dec 21, 2015 at 7:20 PM, Zhan Zhang wrote: This looks to me is a very unusual use case. You stop the SparkContext, and start another one. I don’t think it is well supported. As the SparkContext is stopped, all the resources are supposed to be released. Is there any mandatory reason you have stop and restart another SparkContext. Thanks. Zhan Zhang Note that when sc is stopped, all resources are released (for example in yarn > On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: > > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when a different spark context is provided. > > ``` > val sc = new SparkContext > val sqlContext =SQLContext.getOrCreate(sc) > sc.stop > ... > > val sc2 = new SparkContext > val sqlContext2 = SQLContext.getOrCreate(sc2) > sc2.stop > ``` > > The sqlContext2 will reference sc instead of sc2 and therefore, the > program will not work because sc has been stopped. > > Best Regards, > > Jerry >>> - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
Hi Sean, What if the spark context stops for involuntary reasons (misbehavior of some connections) then we need to programmatically handle the failures by recreating spark context. Is there something I don't understand/know about the assumptions on how to use spark context? I tend to think of it as a resource manager/scheduler for spark jobs. Are you guys planning to deprecate the stop method from spark? Best Regards, Jerry Sent from my iPhone > On 22 Dec, 2015, at 3:57 am, Sean Owenwrote: > > Although in many cases it does work to stop and then start a second > context, it wasn't how Spark was originally designed, and I still see > gotchas. I'd avoid it. I don't think you should have to release some > resources; just keep the same context alive. > >> On Tue, Dec 22, 2015 at 5:13 AM, Jerry Lam wrote: >> Hi Zhan, >> >> I'm illustrating the issue via a simple example. However it is not difficult >> to imagine use cases that need this behaviour. For example, you want to >> release all resources of spark when it does not use for longer than an hour >> in a job server like web services. Unless you can prevent people from >> stopping spark context, then it is reasonable to assume that people can stop >> it and start it again in later time. >> >> Best Regards, >> >> Jerry >> >> >>> On Mon, Dec 21, 2015 at 7:20 PM, Zhan Zhang wrote: >>> >>> This looks to me is a very unusual use case. You stop the SparkContext, >>> and start another one. I don’t think it is well supported. As the >>> SparkContext is stopped, all the resources are supposed to be released. >>> >>> Is there any mandatory reason you have stop and restart another >>> SparkContext. >>> >>> Thanks. >>> >>> Zhan Zhang >>> >>> Note that when sc is stopped, all resources are released (for example in >>> yarn On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: Hi Spark developers, I found that SQLContext.getOrCreate(sc: SparkContext) does not behave correctly when a different spark context is provided. ``` val sc = new SparkContext val sqlContext =SQLContext.getOrCreate(sc) sc.stop ... val sc2 = new SparkContext val sqlContext2 = SQLContext.getOrCreate(sc2) sc2.stop ``` The sqlContext2 will reference sc instead of sc2 and therefore, the program will not work because sc has been stopped. Best Regards, Jerry >> - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
Hi Fyi The following 2 tickets are blocking currently (for releases up to 1.5.2) the pattern of Starting and Stopping a sparkContext inside the same driver program https://issues.apache.org/jira/browse/SPARK-11700 ->memory leak in SqlContext https://issues.apache.org/jira/browse/SPARK-11739 In an application we have built we initially wanted to use the same pattern (start-stop-start.etc) in order to have a better usage of the spark cluster resources. I believe that the fixes in the above tickets will allow to safely stop and restart the sparkContext in the driver program in release 1.6.0 Kind Regards 2015-12-22 21:00 GMT+02:00 Sean Owen: > I think the original idea is that the life of the driver is the life > of the SparkContext: the context is stopped when the driver finishes. > Or: if for some reason the "context" dies or there's an unrecoverable > error, that's it for the driver. > > (There's nothing wrong with stop(), right? you have to call that when > the driver ends to shut down Spark cleanly. It's the re-starting > another context that's at issue.) > > This makes most sense in the context of a resource manager, which can > conceivably restart a driver if you like, but can't reach into your > program. > > That's probably still the best way to think of it. Still it would be > nice if SparkContext were friendlier to a restart just as a matter of > design. AFAIK it is; not sure about SQLContext though. If it's not a > priority it's just because this isn't a usual usage pattern, which > doesn't mean it's crazy, just not the primary pattern. > > On Tue, Dec 22, 2015 at 5:57 PM, Jerry Lam wrote: > > Hi Sean, > > > > What if the spark context stops for involuntary reasons (misbehavior of > some connections) then we need to programmatically handle the failures by > recreating spark context. Is there something I don't understand/know about > the assumptions on how to use spark context? I tend to think of it as a > resource manager/scheduler for spark jobs. Are you guys planning to > deprecate the stop method from spark? > > > > Best Regards, > > > > Jerry > > > > Sent from my iPhone > > > >> On 22 Dec, 2015, at 3:57 am, Sean Owen wrote: > >> > >> Although in many cases it does work to stop and then start a second > >> context, it wasn't how Spark was originally designed, and I still see > >> gotchas. I'd avoid it. I don't think you should have to release some > >> resources; just keep the same context alive. > >> > >>> On Tue, Dec 22, 2015 at 5:13 AM, Jerry Lam > wrote: > >>> Hi Zhan, > >>> > >>> I'm illustrating the issue via a simple example. However it is not > difficult > >>> to imagine use cases that need this behaviour. For example, you want to > >>> release all resources of spark when it does not use for longer than an > hour > >>> in a job server like web services. Unless you can prevent people from > >>> stopping spark context, then it is reasonable to assume that people > can stop > >>> it and start it again in later time. > >>> > >>> Best Regards, > >>> > >>> Jerry > >>> > >>> > On Mon, Dec 21, 2015 at 7:20 PM, Zhan Zhang > wrote: > > This looks to me is a very unusual use case. You stop the > SparkContext, > and start another one. I don’t think it is well supported. As the > SparkContext is stopped, all the resources are supposed to be > released. > > Is there any mandatory reason you have stop and restart another > SparkContext. > > Thanks. > > Zhan Zhang > > Note that when sc is stopped, all resources are released (for example > in > yarn > > On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: > > > > Hi Spark developers, > > > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > > correctly when a different spark context is provided. > > > > ``` > > val sc = new SparkContext > > val sqlContext =SQLContext.getOrCreate(sc) > > sc.stop > > ... > > > > val sc2 = new SparkContext > > val sqlContext2 = SQLContext.getOrCreate(sc2) > > sc2.stop > > ``` > > > > The sqlContext2 will reference sc instead of sc2 and therefore, the > > program will not work because sc has been stopped. > > > > Best Regards, > > > > Jerry > >>> > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
This looks to me is a very unusual use case. You stop the SparkContext, and start another one. I don’t think it is well supported. As the SparkContext is stopped, all the resources are supposed to be released. Is there any mandatory reason you have stop and restart another SparkContext. Thanks. Zhan Zhang Note that when sc is stopped, all resources are released (for example in yarn On Dec 20, 2015, at 2:59 PM, Jerry Lamwrote: > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when a different spark context is provided. > > ``` > val sc = new SparkContext > val sqlContext =SQLContext.getOrCreate(sc) > sc.stop > ... > > val sc2 = new SparkContext > val sqlContext2 = SQLContext.getOrCreate(sc2) > sc2.stop > ``` > > The sqlContext2 will reference sc instead of sc2 and therefore, the program > will not work because sc has been stopped. > > Best Regards, > > Jerry - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
Hi Zhan, I'm illustrating the issue via a simple example. However it is not difficult to imagine use cases that need this behaviour. For example, you want to release all resources of spark when it does not use for longer than an hour in a job server like web services. Unless you can prevent people from stopping spark context, then it is reasonable to assume that people can stop it and start it again in later time. Best Regards, Jerry On Mon, Dec 21, 2015 at 7:20 PM, Zhan Zhangwrote: > This looks to me is a very unusual use case. You stop the SparkContext, > and start another one. I don’t think it is well supported. As the > SparkContext is stopped, all the resources are supposed to be released. > > Is there any mandatory reason you have stop and restart another > SparkContext. > > Thanks. > > Zhan Zhang > > Note that when sc is stopped, all resources are released (for example in > yarn > On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: > > > Hi Spark developers, > > > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when a different spark context is provided. > > > > ``` > > val sc = new SparkContext > > val sqlContext =SQLContext.getOrCreate(sc) > > sc.stop > > ... > > > > val sc2 = new SparkContext > > val sqlContext2 = SQLContext.getOrCreate(sc2) > > sc2.stop > > ``` > > > > The sqlContext2 will reference sc instead of sc2 and therefore, the > program will not work because sc has been stopped. > > > > Best Regards, > > > > Jerry > >
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
In Jerry's example, the first SparkContext, sc, has been stopped. So there would be only one SparkContext running at any given moment. Cheers On Mon, Dec 21, 2015 at 8:23 AM, Chester @workwrote: > Jerry > I thought you should not create more than one SparkContext within one > Jvm, ... > Chester > > Sent from my iPhone > > > On Dec 20, 2015, at 2:59 PM, Jerry Lam wrote: > > > > Hi Spark developers, > > > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when a different spark context is provided. > > > > ``` > > val sc = new SparkContext > > val sqlContext =SQLContext.getOrCreate(sc) > > sc.stop > > ... > > > > val sc2 = new SparkContext > > val sqlContext2 = SQLContext.getOrCreate(sc2) > > sc2.stop > > ``` > > > > The sqlContext2 will reference sc instead of sc2 and therefore, the > program will not work because sc has been stopped. > > > > Best Regards, > > > > Jerry > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
Jerry I thought you should not create more than one SparkContext within one Jvm, ... Chester Sent from my iPhone > On Dec 20, 2015, at 2:59 PM, Jerry Lamwrote: > > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when a different spark context is provided. > > ``` > val sc = new SparkContext > val sqlContext =SQLContext.getOrCreate(sc) > sc.stop > ... > > val sc2 = new SparkContext > val sqlContext2 = SQLContext.getOrCreate(sc2) > sc2.stop > ``` > > The sqlContext2 will reference sc instead of sc2 and therefore, the program > will not work because sc has been stopped. > > Best Regards, > > Jerry - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour
Hi Jerry, Looks like https://issues.apache.org/jira/browse/SPARK-11739 is for the issue you described. It has been fixed in 1.6. With this change, when you call SQLContext.getOrCreate(sc2), we will first check if sc has been stopped. If so, we will create a new SQLContext using sc2. Thanks, Yin On Sun, Dec 20, 2015 at 2:59 PM, Jerry Lamwrote: > Hi Spark developers, > > I found that SQLContext.getOrCreate(sc: SparkContext) does not behave > correctly when a different spark context is provided. > > ``` > val sc = new SparkContext > val sqlContext =SQLContext.getOrCreate(sc) > sc.stop > ... > > val sc2 = new SparkContext > val sqlContext2 = SQLContext.getOrCreate(sc2) > sc2.stop > ``` > > The sqlContext2 will reference sc instead of sc2 and therefore, the > program will not work because sc has been stopped. > > Best Regards, > > Jerry >
[Spark SQL] SQLContext getOrCreate incorrect behaviour
Hi Spark developers, I found that SQLContext.getOrCreate(sc: SparkContext) does not behave correctly when a different spark context is provided. ``` val sc = new SparkContext val sqlContext =SQLContext.getOrCreate(sc) sc.stop ... val sc2 = new SparkContext val sqlContext2 = SQLContext.getOrCreate(sc2) sc2.stop ``` The sqlContext2 will reference sc instead of sc2 and therefore, the program will not work because sc has been stopped. Best Regards, Jerry