Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jhon Anderson Cardenas Diaz
Hi, we already have spark (and python) configured as per user - scoped mode
and even in that case it does not work. But i will try your second option!.
thank you..

2018-06-12 21:24 GMT-05:00 Jeff Zhang :

> This is a limitation of the native PySparkInterpreter.
>
> Two solutions for you.
> 1. Use per user scoped mode so that each user own his own python process
> 2. Use IPySparkInterpreter of zeppelin 0.8 which is better for integration
> python with zeppelin.
>
>
>
> Jhon Anderson Cardenas Diaz 于2018年6月13日周三
> 上午6:15写道:
>
> > Hi!
> >
> > We found the reason why this error is happening. It seems to be related
> > with the solution
> > <
> > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87
> 4074b070fd
> > >
> > for the task ZEPPELIN-2075
> > .
> >
> > This solution is causing that when one particular user cancels his
> py-spark
> > job, the py-spark jobs from *all the users are being canceled !!*.
> >
> > When a py-spark job is cancelled, the method PySparkInterpreter
> interrupt()
> > is invoked, and then the SIGINT event is called, causing that all the
> jobs
> > in the same spark context be cancelled:
> >
> > context.py:
> >
> > # create a signal handler which would be invoked on receiving SIGINT
> > def signal_handler(signal, frame):
> > *self.cancelAllJobs()*
> > raise KeyboardInterrupt()
> >
> > Is this a zeppelin bug ?
> >
> > Thank you.
> >
> >
> > 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > jhonderson2...@gmail.com>:
> >
> > > Hi!
> > >
> > > We found the reason why this error is happening. It seems to be related
> > > with the solution
> > > <
> > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87
> 4074b070fd
> > >
> > > for the task ZEPPELIN-2075
> > > .
> > >
> > > This solution is causing that when one particular user cancels his
> > > py-spark job, the py-spark jobs from all the users are being canceled.
> > >
> > > When a py-spark job is cancelled, the method PySparkInterpreter
> > > interrupt() is invoked, and then the SIGINT
> > >
> > > context.py:
> > >
> > > # create a signal handler which would be invoked on receiving SIGINT
> > > def signal_handler(signal, frame):
> > > self.cancelAllJobs()
> > > raise KeyboardInterrupt()
> > >
> > >
> > > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > > jhonderson2...@gmail.com>:
> > >
> > >> Hi!.
> > >> I have 0.8.0 version, from September  2017
> > >>
> > >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <
> jzh...@hortonworks.com
> > >:
> > >>
> > >>>
> > >>> Which version do you use ?
> > >>>
> > >>>
> > >>> Best Regard,
> > >>> Jeff Zhang
> > >>>
> > >>>
> > >>> From: Jhon Anderson Cardenas Diaz  > >>> jhonderson2...@gmail.com>>
> > >>> Reply-To: "us...@zeppelin.apache.org org
> > >"
> > >>> mailto:us...@zeppelin.apache.org>>
> > >>> Date: Friday, June 8, 2018 at 11:08 PM
> > >>> To: "us...@zeppelin.apache.org" <
> > >>> us...@zeppelin.apache.org>, "
> > >>> dev@zeppelin.apache.org" <
> > >>> dev@zeppelin.apache.org>
> > >>> Subject: All PySpark jobs are canceled when one user cancel his
> PySpark
> > >>> paragraph (job)
> > >>>
> > >>> Dear community,
> > >>>
> > >>> Currently we are having problems with multiple users running
> paragraphs
> > >>> associated with pyspark jobs.
> > >>>
> > >>> The problem is that if an user aborts/cancels his pyspark paragraph
> > >>> (job), the active pyspark jobs of the other users are canceled too.
> > >>>
> > >>> Going into detail, I've seen that when you cancel a user's job this
> > >>> method is invoked (which is fine):
> > >>>
> > >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
> > >>>
> > >>> But somehow unknown to me, this method is also invoked:
> > >>>
> > >>> sc.cancelAllJobs()
> > >>>
> > >>> The above is due to the trace of the log that appears in the jobs of
> > the
> > >>> other users:
> > >>>
> > >>> Py4JJavaError: An error occurred while calling o885.count.
> > >>> : org.apache.spark.SparkException: Job 461 cancelled as part of
> > >>> cancellation of all jobs
> > >>> at org.apache.spark.scheduler.DAGScheduler.org > >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
> > >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> > >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
> > >>> n(DAGScheduler.scala:1375)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply(DAGScheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> 

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jeff Zhang
This is a limitation of the native PySparkInterpreter.

Two solutions for you.
1. Use per user scoped mode so that each user own his own python process
2. Use IPySparkInterpreter of zeppelin 0.8 which is better for integration
python with zeppelin.



Jhon Anderson Cardenas Diaz 于2018年6月13日周三
上午6:15写道:

> Hi!
>
> We found the reason why this error is happening. It seems to be related
> with the solution
> <
> https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd
> >
> for the task ZEPPELIN-2075
> .
>
> This solution is causing that when one particular user cancels his py-spark
> job, the py-spark jobs from *all the users are being canceled !!*.
>
> When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
> is invoked, and then the SIGINT event is called, causing that all the jobs
> in the same spark context be cancelled:
>
> context.py:
>
> # create a signal handler which would be invoked on receiving SIGINT
> def signal_handler(signal, frame):
> *self.cancelAllJobs()*
> raise KeyboardInterrupt()
>
> Is this a zeppelin bug ?
>
> Thank you.
>
>
> 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
> jhonderson2...@gmail.com>:
>
> > Hi!
> >
> > We found the reason why this error is happening. It seems to be related
> > with the solution
> > <
> https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd
> >
> > for the task ZEPPELIN-2075
> > .
> >
> > This solution is causing that when one particular user cancels his
> > py-spark job, the py-spark jobs from all the users are being canceled.
> >
> > When a py-spark job is cancelled, the method PySparkInterpreter
> > interrupt() is invoked, and then the SIGINT
> >
> > context.py:
> >
> > # create a signal handler which would be invoked on receiving SIGINT
> > def signal_handler(signal, frame):
> > self.cancelAllJobs()
> > raise KeyboardInterrupt()
> >
> >
> > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > jhonderson2...@gmail.com>:
> >
> >> Hi!.
> >> I have 0.8.0 version, from September  2017
> >>
> >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang  >:
> >>
> >>>
> >>> Which version do you use ?
> >>>
> >>>
> >>> Best Regard,
> >>> Jeff Zhang
> >>>
> >>>
> >>> From: Jhon Anderson Cardenas Diaz  >>> jhonderson2...@gmail.com>>
> >>> Reply-To: "us...@zeppelin.apache.org >"
> >>> mailto:us...@zeppelin.apache.org>>
> >>> Date: Friday, June 8, 2018 at 11:08 PM
> >>> To: "us...@zeppelin.apache.org" <
> >>> us...@zeppelin.apache.org>, "
> >>> dev@zeppelin.apache.org" <
> >>> dev@zeppelin.apache.org>
> >>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
> >>> paragraph (job)
> >>>
> >>> Dear community,
> >>>
> >>> Currently we are having problems with multiple users running paragraphs
> >>> associated with pyspark jobs.
> >>>
> >>> The problem is that if an user aborts/cancels his pyspark paragraph
> >>> (job), the active pyspark jobs of the other users are canceled too.
> >>>
> >>> Going into detail, I've seen that when you cancel a user's job this
> >>> method is invoked (which is fine):
> >>>
> >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
> >>>
> >>> But somehow unknown to me, this method is also invoked:
> >>>
> >>> sc.cancelAllJobs()
> >>>
> >>> The above is due to the trace of the log that appears in the jobs of
> the
> >>> other users:
> >>>
> >>> Py4JJavaError: An error occurred while calling o885.count.
> >>> : org.apache.spark.SparkException: Job 461 cancelled as part of
> >>> cancellation of all jobs
> >>> at org.apache.spark.scheduler.DAGScheduler.org >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
> >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
> >>> n(DAGScheduler.scala:1375)
> >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> >>> Jobs$1.apply(DAGScheduler.scala:721)
> >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> >>> Jobs$1.apply(DAGScheduler.scala:721)
> >>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> >>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
> >>> cheduler.scala:721)
> >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
> >>> Receive(DAGScheduler.scala:1628)
> >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> >>> ceive(DAGScheduler.scala:1605)
> >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> >>> ceive(DAGScheduler.scala:1594)
> >>> at 

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jhon Anderson Cardenas Diaz
Hi!

We found the reason why this error is happening. It seems to be related
with the solution

for the task ZEPPELIN-2075
.

This solution is causing that when one particular user cancels his py-spark
job, the py-spark jobs from *all the users are being canceled !!*.

When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
is invoked, and then the SIGINT event is called, causing that all the jobs
in the same spark context be cancelled:

context.py:

# create a signal handler which would be invoked on receiving SIGINT
def signal_handler(signal, frame):
*self.cancelAllJobs()*
raise KeyboardInterrupt()

Is this a zeppelin bug ?

Thank you.


2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
jhonderson2...@gmail.com>:

> Hi!
>
> We found the reason why this error is happening. It seems to be related
> with the solution
> 
> for the task ZEPPELIN-2075
> .
>
> This solution is causing that when one particular user cancels his
> py-spark job, the py-spark jobs from all the users are being canceled.
>
> When a py-spark job is cancelled, the method PySparkInterpreter
> interrupt() is invoked, and then the SIGINT
>
> context.py:
>
> # create a signal handler which would be invoked on receiving SIGINT
> def signal_handler(signal, frame):
> self.cancelAllJobs()
> raise KeyboardInterrupt()
>
>
> 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> jhonderson2...@gmail.com>:
>
>> Hi!.
>> I have 0.8.0 version, from September  2017
>>
>> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang :
>>
>>>
>>> Which version do you use ?
>>>
>>>
>>> Best Regard,
>>> Jeff Zhang
>>>
>>>
>>> From: Jhon Anderson Cardenas Diaz >> jhonderson2...@gmail.com>>
>>> Reply-To: "us...@zeppelin.apache.org"
>>> mailto:us...@zeppelin.apache.org>>
>>> Date: Friday, June 8, 2018 at 11:08 PM
>>> To: "us...@zeppelin.apache.org" <
>>> us...@zeppelin.apache.org>, "
>>> dev@zeppelin.apache.org" <
>>> dev@zeppelin.apache.org>
>>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
>>> paragraph (job)
>>>
>>> Dear community,
>>>
>>> Currently we are having problems with multiple users running paragraphs
>>> associated with pyspark jobs.
>>>
>>> The problem is that if an user aborts/cancels his pyspark paragraph
>>> (job), the active pyspark jobs of the other users are canceled too.
>>>
>>> Going into detail, I've seen that when you cancel a user's job this
>>> method is invoked (which is fine):
>>>
>>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>>>
>>> But somehow unknown to me, this method is also invoked:
>>>
>>> sc.cancelAllJobs()
>>>
>>> The above is due to the trace of the log that appears in the jobs of the
>>> other users:
>>>
>>> Py4JJavaError: An error occurred while calling o885.count.
>>> : org.apache.spark.SparkException: Job 461 cancelled as part of
>>> cancellation of all jobs
>>> at org.apache.spark.scheduler.DAGScheduler.org>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
>>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
>>> n(DAGScheduler.scala:1375)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>>> Jobs$1.apply(DAGScheduler.scala:721)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>>> Jobs$1.apply(DAGScheduler.scala:721)
>>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
>>> cheduler.scala:721)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>>> Receive(DAGScheduler.scala:1628)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>> ceive(DAGScheduler.scala:1605)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>> ceive(DAGScheduler.scala:1594)
>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
>>> scala:628)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
>>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>> 

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jhon Anderson Cardenas Diaz
Hi!

We found the reason why this error is happening. It seems to be related
with the solution

for the task ZEPPELIN-2075
.

This solution is causing that when one particular user cancels his py-spark
job, the py-spark jobs from all the users are being canceled.

When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
is invoked, and then the SIGINT

context.py:

# create a signal handler which would be invoked on receiving SIGINT
def signal_handler(signal, frame):
self.cancelAllJobs()
raise KeyboardInterrupt()


2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
jhonderson2...@gmail.com>:

> Hi!.
> I have 0.8.0 version, from September  2017
>
> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang :
>
>>
>> Which version do you use ?
>>
>>
>> Best Regard,
>> Jeff Zhang
>>
>>
>> From: Jhon Anderson Cardenas Diaz > jhonderson2...@gmail.com>>
>> Reply-To: "us...@zeppelin.apache.org" <
>> us...@zeppelin.apache.org>
>> Date: Friday, June 8, 2018 at 11:08 PM
>> To: "us...@zeppelin.apache.org" <
>> us...@zeppelin.apache.org>, "
>> dev@zeppelin.apache.org" <
>> dev@zeppelin.apache.org>
>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
>> paragraph (job)
>>
>> Dear community,
>>
>> Currently we are having problems with multiple users running paragraphs
>> associated with pyspark jobs.
>>
>> The problem is that if an user aborts/cancels his pyspark paragraph
>> (job), the active pyspark jobs of the other users are canceled too.
>>
>> Going into detail, I've seen that when you cancel a user's job this
>> method is invoked (which is fine):
>>
>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>>
>> But somehow unknown to me, this method is also invoked:
>>
>> sc.cancelAllJobs()
>>
>> The above is due to the trace of the log that appears in the jobs of the
>> other users:
>>
>> Py4JJavaError: An error occurred while calling o885.count.
>> : org.apache.spark.SparkException: Job 461 cancelled as part of
>> cancellation of all jobs
>> at org.apache.spark.scheduler.DAGScheduler.org> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$
>> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
>> n(DAGScheduler.scala:1375)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply(DAGScheduler.scala:721)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply(DAGScheduler.scala:721)
>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
>> cheduler.scala:721)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>> Receive(DAGScheduler.scala:1628)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1605)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1594)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:151)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:112)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(
>> SparkPlan.scala:275)
>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
>> ataset$$execute$1$1.apply(Dataset.scala:2386)
>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(SQLExecution.scala:57)
>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
>> at org.apache.spark.sql.Dataset.org> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
>> at org.apache.spark.sql.Dataset.org> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>> scala:2420)
>> at 

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jhon Anderson Cardenas Diaz
Hi!.
I have 0.8.0 version, from September  2017

2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang :

>
> Which version do you use ?
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: Jhon Anderson Cardenas Diaz  jhonderson2...@gmail.com>>
> Reply-To: "us...@zeppelin.apache.org" <
> us...@zeppelin.apache.org>
> Date: Friday, June 8, 2018 at 11:08 PM
> To: "us...@zeppelin.apache.org" <
> us...@zeppelin.apache.org>, "
> dev@zeppelin.apache.org" <
> dev@zeppelin.apache.org>
> Subject: All PySpark jobs are canceled when one user cancel his PySpark
> paragraph (job)
>
> Dear community,
>
> Currently we are having problems with multiple users running paragraphs
> associated with pyspark jobs.
>
> The problem is that if an user aborts/cancels his pyspark paragraph (job),
> the active pyspark jobs of the other users are canceled too.
>
> Going into detail, I've seen that when you cancel a user's job this method
> is invoked (which is fine):
>
> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>
> But somehow unknown to me, this method is also invoked:
>
> sc.cancelAllJobs()
>
> The above is due to the trace of the log that appears in the jobs of the
> other users:
>
> Py4JJavaError: An error occurred while calling o885.count.
> : org.apache.spark.SparkException: Job 461 cancelled as part of
> cancellation of all jobs
> at org.apache.spark.scheduler.DAGScheduler.org apache.spark.scheduler.DAGScheduler.org>$apache$
> spark$scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1435)
> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(
> DAGScheduler.scala:1375)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> doCancelAllJobs$1.apply(DAGScheduler.scala:721)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> doCancelAllJobs$1.apply(DAGScheduler.scala:721)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(
> DAGScheduler.scala:721)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1628)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1605)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1594)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
> at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.
> scala:275)
> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> Dataset$$execute$1$1.apply(Dataset.scala:2386)
> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> SQLExecution.scala:57)
> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
> at org.apache.spark.sql.Dataset.org sql.Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
> at org.apache.spark.sql.Dataset.org sql.Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> at java.lang.Thread.run(Thread.java:748)
>
> (, Py4JJavaError('An error occurred
> while calling o885.count.\n', 

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jianfeng (Jeff) Zhang

Which version do you use ?


Best Regard,
Jeff Zhang


From: Jhon Anderson Cardenas Diaz 
mailto:jhonderson2...@gmail.com>>
Reply-To: "us...@zeppelin.apache.org" 
mailto:us...@zeppelin.apache.org>>
Date: Friday, June 8, 2018 at 11:08 PM
To: "us...@zeppelin.apache.org" 
mailto:us...@zeppelin.apache.org>>, 
"dev@zeppelin.apache.org" 
mailto:dev@zeppelin.apache.org>>
Subject: All PySpark jobs are canceled when one user cancel his PySpark 
paragraph (job)

Dear community,

Currently we are having problems with multiple users running paragraphs 
associated with pyspark jobs.

The problem is that if an user aborts/cancels his pyspark paragraph (job), the 
active pyspark jobs of the other users are canceled too.

Going into detail, I've seen that when you cancel a user's job this method is 
invoked (which is fine):

sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")

But somehow unknown to me, this method is also invoked:

sc.cancelAllJobs()

The above is due to the trace of the log that appears in the jobs of the other 
users:

Py4JJavaError: An error occurred while calling o885.count.
: org.apache.spark.SparkException: Job 461 cancelled as part of cancellation of 
all jobs
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at 
org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1375)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at 
org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:721)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1628)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

(, Py4JJavaError('An error occurred while 
calling o885.count.\n', JavaObject id=o886), )

Any idea of why this could be happening?

(I have 0.8.0 version from September 2017)

Thank you!