[jira] [Resolved] (LIVY-409) Improve User Experience in livy-shell

2017-10-19 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-409.
--
   Resolution: Fixed
Fix Version/s: 0.5.0

Issue resolved by pull request 55
[https://github.com/apache/incubator-livy/pull/55]

> Improve User Experience in livy-shell
> -
>
> Key: LIVY-409
> URL: https://issues.apache.org/jira/browse/LIVY-409
> Project: Livy
>  Issue Type: Improvement
>Reporter: Eric Perry
>Priority: Minor
> Fix For: 0.5.0
>
>
> The livy-shell is useful in testing and evaluation of the use of Livy, but 
> has a few minor UX issues that could be fixed without many changes:
> # The use of httplib and a single connection can cause problems in 
> environments where network reliability is low.
> # The shell prompt does not include any contextual information, which may be 
> helpful when a user has multiple shells running.
> # There isn't an easy way to cancel the current command other than deleting 
> all the text on the prompt as SIGINT breaks out of the REPL and causes the 
> session to be deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SPARK-22290) Starting second context in same JVM fails to get new Hive delegation token

2017-10-19 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-22290.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19509
[https://github.com/apache/spark/pull/19509]

> Starting second context in same JVM fails to get new Hive delegation token
> --
>
> Key: SPARK-22290
> URL: https://issues.apache.org/jira/browse/SPARK-22290
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> Consider the following pyspark script:
> {code}
> sc = SparkContext()
> // do stuff
> sc.stop()
> // do some other stuff
> sc = SparkContext()
> {code}
> That code didn't use to work at all in 2.2 (failure to create the second 
> context), but makes more progress in 2.3. But it fails to create new Hive 
> delegation tokens; you see this error in the output:
> {noformat}
> 17/10/16 16:26:50 INFO security.HadoopFSDelegationTokenProvider: getting 
> token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1714191595_19, 
> ugi=blah(auth:KERBEROS)]]
> 17/10/16 16:26:50 INFO hive.metastore: Trying to connect to metastore with 
> URI blah
> 17/10/16 16:26:50 INFO hive.metastore: Connected to metastore.
> 17/10/16 16:26:50 ERROR metadata.Hive: MetaException(message:Delegation Token 
> can be issued only with kerberos authentication. Current 
> AuthenticationMethod: TOKEN)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore
> {noformat}
> The error is printed in the logs but it doesn't cause the app to fail (which 
> might be considered wrong).
> The effect is that when that old delegation token expires the new app will 
> fail.
> But the real issue here is that Spark shouldn't be mixing delegation tokens 
> from different apps. It should try harder to isolate a set of delegation 
> tokens to a single app submission.
> And, in the case of Hive, there are many situations where a delegation token 
> isn't needed at all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22290) Starting second context in same JVM fails to get new Hive delegation token

2017-10-19 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-22290:
---

Assignee: Marcelo Vanzin

> Starting second context in same JVM fails to get new Hive delegation token
> --
>
> Key: SPARK-22290
> URL: https://issues.apache.org/jira/browse/SPARK-22290
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> Consider the following pyspark script:
> {code}
> sc = SparkContext()
> // do stuff
> sc.stop()
> // do some other stuff
> sc = SparkContext()
> {code}
> That code didn't use to work at all in 2.2 (failure to create the second 
> context), but makes more progress in 2.3. But it fails to create new Hive 
> delegation tokens; you see this error in the output:
> {noformat}
> 17/10/16 16:26:50 INFO security.HadoopFSDelegationTokenProvider: getting 
> token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1714191595_19, 
> ugi=blah(auth:KERBEROS)]]
> 17/10/16 16:26:50 INFO hive.metastore: Trying to connect to metastore with 
> URI blah
> 17/10/16 16:26:50 INFO hive.metastore: Connected to metastore.
> 17/10/16 16:26:50 ERROR metadata.Hive: MetaException(message:Delegation Token 
> can be issued only with kerberos authentication. Current 
> AuthenticationMethod: TOKEN)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result.read(ThriftHiveMetastore
> {noformat}
> The error is printed in the logs but it doesn't cause the app to fail (which 
> might be considered wrong).
> The effect is that when that old delegation token expires the new app will 
> fail.
> But the real issue here is that Spark shouldn't be mixing delegation tokens 
> from different apps. It should try harder to isolate a set of delegation 
> tokens to a single app submission.
> And, in the case of Hive, there are many situations where a delegation token 
> isn't needed at all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (AMBARI-22260) Update Spark2 log4j default settings to latest

2017-10-18 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/AMBARI-22260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210628#comment-16210628
 ] 

Saisai Shao commented on AMBARI-22260:
--

Please help to review [~sumitmohanty] [~jluniya], thanks!

> Update Spark2 log4j default settings to latest
> --
>
> Key: AMBARI-22260
> URL: https://issues.apache.org/jira/browse/AMBARI-22260
> Project: Ambari
>  Issue Type: Bug
>  Components: stacks
>Affects Versions: 2.6.0
>    Reporter: Saisai Shao
>Assignee: Saisai Shao
>
> Apache Spark log4j definition has changed in some places, we should update 
> this default setting in Ambari Spark2 stack definition to reflect such 
> changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Review Request 63138: Update Spark2 log4j default settings to latest

2017-10-18 Thread Saisai Shao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63138/
---

Review request for Ambari and Sumit Mohanty.


Bugs: AMBARI-22260
https://issues.apache.org/jira/browse/AMBARI-22260


Repository: ambari


Description
---

Update Ambari Spark2 log4j related configurations.


Diffs
-

  ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/config-upgrade.xml 
d138d6017c 
  
ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/nonrolling-upgrade-2.6.xml
 8012c90b0b 
  ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml 
7c43948ba3 
  
ambari-server/src/main/resources/stacks/HDP/2.6/services/SPARK2/configuration/spark2-log4j-properties.xml
 PRE-CREATION 


Diff: https://reviews.apache.org/r/63138/diff/1/


Testing
---

Manually verification on refresh install of HDP 2.6 and upgrade from HDP 2.5.


Thanks,

Saisai Shao



[jira] [Created] (AMBARI-22260) Update Spark2 log4j default settings to latest

2017-10-18 Thread Saisai Shao (JIRA)
Saisai Shao created AMBARI-22260:


 Summary: Update Spark2 log4j default settings to latest
 Key: AMBARI-22260
 URL: https://issues.apache.org/jira/browse/AMBARI-22260
 Project: Ambari
  Issue Type: Bug
  Components: stacks
Affects Versions: 2.6.0
Reporter: Saisai Shao
Assignee: Saisai Shao


Apache Spark log4j definition has changed in some places, we should update this 
default setting in Ambari Spark2 stack definition to reflect such changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (LIVY-411) Session cannot be started when Python or R package is missing

2017-10-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-411:
-
Reporter: Yesha Vora  (was: Saisai Shao)

> Session cannot be started when Python or R package is missing
> -
>
> Key: LIVY-411
> URL: https://issues.apache.org/jira/browse/LIVY-411
> Project: Livy
>  Issue Type: Bug
>  Components: REPL
>Affects Versions: 0.5.0
>Reporter: Yesha Vora
>Assignee: Saisai Shao
>
> In Livy 0.5.0, we supported multiple languages in one session, but it 
> requires that the packages should be available, such as R package and Python 
> package, otherwise session will be failed to create. However, in some cases 
> python or R package may be missing in Spark distro, this will make Livy fail 
> to creation interactive session.
> To fix this issue, we should not force such restriction on session creation, 
> but delay the check until related interpreter is used. If such packaging is 
> missing, we should make the related execution failure and return user the 
> cause of issue, but don't affect other correctly started interpreters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (LIVY-411) Session cannot be started when Python or R package is missing

2017-10-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-411:
-
Description: 
In Livy 0.5.0, we supported multiple languages in one session, but it requires 
that the packages should be available, such as R package and Python package, 
otherwise session will be failed to create. However, in some cases python or R 
package may be missing in Spark distro, this will make Livy fail to creation 
interactive session.

To fix this issue, we should not force such restriction on session creation, 
but delay the check until related interpreter is used. If such packaging is 
missing, we should make the related execution failure and return user the cause 
of issue, but don't affect other correctly started interpreters.

  was:
In Livy 0.5.0, we supported multiple languages in one session, but it requires 
one the packages be available, such as R package and Python package, otherwise 
session will be failed to create. But in some cases python or R package may be 
missing in Spark distro, this will make us fail to use Livy's interactive 
session.

To fix this issue, we should not force such restriction on session creation, 
but delay the check until related interpreter is used. If such packaging is 
missing, we should make the related execution failure and return user the cause 
of issue, but don't affect other correctly started interpreters.


> Session cannot be started when Python or R package is missing
> -
>
> Key: LIVY-411
> URL: https://issues.apache.org/jira/browse/LIVY-411
> Project: Livy
>  Issue Type: Bug
>  Components: REPL
>Affects Versions: 0.5.0
>    Reporter: Saisai Shao
>Assignee: Saisai Shao
>
> In Livy 0.5.0, we supported multiple languages in one session, but it 
> requires that the packages should be available, such as R package and Python 
> package, otherwise session will be failed to create. However, in some cases 
> python or R package may be missing in Spark distro, this will make Livy fail 
> to creation interactive session.
> To fix this issue, we should not force such restriction on session creation, 
> but delay the check until related interpreter is used. If such packaging is 
> missing, we should make the related execution failure and return user the 
> cause of issue, but don't affect other correctly started interpreters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (LIVY-411) Session cannot be started when Python or R package is missing

2017-10-18 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-411:


 Summary: Session cannot be started when Python or R package is 
missing
 Key: LIVY-411
 URL: https://issues.apache.org/jira/browse/LIVY-411
 Project: Livy
  Issue Type: Bug
  Components: REPL
Affects Versions: 0.5.0
Reporter: Saisai Shao
Assignee: Saisai Shao


In Livy 0.5.0, we supported multiple languages in one session, but it requires 
one the packages be available, such as R package and Python package, otherwise 
session will be failed to create. But in some cases python or R package may be 
missing in Spark distro, this will make us fail to use Livy's interactive 
session.

To fix this issue, we should not force such restriction on session creation, 
but delay the check until related interpreter is used. If such packaging is 
missing, we should make the related execution failure and return user the cause 
of issue, but don't affect other correctly started interpreters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: NoSuchFileException: spark-internal - when creating interactive session on remote spark

2017-10-16 Thread Saisai Shao
Would you please provide more information about how you create a Livy
session? As for now, Livy only supports spark on yarn and local mode
officially, we don't test on standalone cluster mode, so maybe there's some
issues in it.

On Mon, Oct 16, 2017 at 4:29 AM, Junaid Nasir  wrote:

> Hi everyone,
> I am using livy (livy-0.4.0-incubating-bin), and a remote spark
> (spark-2.2.0-bin-hadoop2.7) stand alone cluster. livy's setting is default
> except setting livy.spark.master = spark://10.128.1.1:6066 and
> livy.spark.deploy-mode = cluster (have tried with default deploy-mode but
> then it throws a different error)
> when i try to create an interactive spark session on remote cluster it
> creates the driver and immediately sends it to error state.
>
> livy console error
>
>   17/10/15 20:18:08 INFO LineBufferedStream: stdout: 17/10/15 
> 20:18:08 INFO RestSubmissionClient: Submitting a request to launch an 
> application in spark://10.128.1.1:6066.
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 17/10/15 20:18:09 INFO 
> RestSubmissionClient: Submission successfully created as 
> driver-20171015201808-0002. Polling submission state...
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 17/10/15 20:18:09 INFO 
> RestSubmissionClient: Submitting a request for the status of submission 
> driver-20171015201808-0002 in spark://10.128.1.1:6066.
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 17/10/15 20:18:09 INFO 
> RestSubmissionClient: State of driver driver-20171015201808-0002 is now ERROR.
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 17/10/15 20:18:09 INFO 
> RestSubmissionClient: Driver is running on worker 
> worker-20171015195836-10.128.1.1-38097 at 10.128.1.1:38097.
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 17/10/15 20:18:09 ERROR 
> RestSubmissionClient: Exception from the cluster:
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 
> java.nio.file.NoSuchFileException: spark-internal
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> java.nio.file.Files.copy(Files.java:1274)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:625)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.util.Utils$.copyFile(Utils.scala:596)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:681)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:480)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:  
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: 17/10/15 20:18:09 INFO 
> RestSubmissionClient: Server responded with CreateSubmissionResponse:
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: {
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:   "action" : 
> "CreateSubmissionResponse",
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:   "message" : "Driver 
> successfully submitted as driver-20171015201808-0002",
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:   "serverSparkVersion" : 
> "2.2.0",
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:   "submissionId" : 
> "driver-20171015201808-0002",
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout:   "success" : true
> 17/10/15 20:18:09 INFO LineBufferedStream: stdout: }
>
> [image: Mixmax]
>  Not using Mixmax yet?
> 
>
> can anyone please guide me how to resolve this?
>
> I have tried setting up spark locally and it works fine. so i guess
> problem is with uploading jar files to remote cluster. I am using GCE for
> both spark cluster and livy server but they have all ports (on internal
> network open)
>
> spark worker log show this too
>
>   17/10/15 20:18:08 INFO Worker: Asked to launch driver 
> 

[jira] [Commented] (SPARK-22242) streaming job failed to restart from checkpoint

2017-10-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205018#comment-16205018
 ] 

Saisai Shao commented on SPARK-22242:
-

It is not resolved, but it is the same type of problem as SPARK-19688. That's 
why I left a comment in the PR to figure out a general solution to this kind of 
problem.

> streaming job failed to restart from checkpoint
> ---
>
> Key: SPARK-22242
> URL: https://issues.apache.org/jira/browse/SPARK-22242
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.0, 2.2.0
>Reporter: StephenZou
>
> My spark-defaults.conf has an item related to the issue, I upload all jars in 
> spark's jars folder to the hdfs path:
> spark.yarn.jars  hdfs:///spark/cache/spark2.2/* 
> Streaming job failed to restart from checkpoint, ApplicationMaster throws  
> "Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ExecutorLauncher".  The problem is always 
> reproducible.
> I examine the sparkconf object recovered from checkpoint, and find 
> spark.yarn.jars are set empty, which let all jars not exist in AM side. The 
> solution is spark.yarn.jars should be reload from properties files when 
> recovering from checkpoint. 
> attach is a demo to reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-12 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202900#comment-16202900
 ] 

Saisai Shao commented on SPARK-9:
-

{quote}
I don't think that limited familiarity with a new promising feature is a good 
enough reason to avoid it. If every new feature will be treated this way, then 
new technologies will never get introduced to Spark.
{quote}

[~yuvaldeg] I think you might misunderstand my points. I'm not saying that 
Spark will never introduce new technologies, my point is that if the technology 
is not only promising enough, but also has a large amount of audience, of 
course we should bring in it, like k8s support. AFAIK RDMA adoption is not so 
common in big data area.

Just my two cents.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-12 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201950#comment-16201950
 ] 

Saisai Shao commented on SPARK-9:
-

My concern is about how to maintain this code in the community? RDMA thing is 
not so well-known as Netty/Socket program, I'm not sure if there're lots of 
devs understand it and can fully leverage it, also how test on a commodity 
machine without RDMA support? I'm afraid if the code is seldom used and 
maintained, it will gradually become obsolete and buggy.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22062) BlockManager does not account for memory consumed by remote fetches

2017-10-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201305#comment-16201305
 ] 

Saisai Shao commented on SPARK-22062:
-

Yes, there potentially has OOM problem, but I think this kind of temporarily 
allocated {{ByteBuffer}} is difficult to be defined as whether it should be 
accounted into storage memory or execution memory. Furthermore, how to deal 
with remote fetching if memory is not enough, shall we fail the task or can we 
stream the remote fetches?

What I can think of is to leverage the current implementation of shuffle to 
spill the large blocks to local disk during fetching, and tasks can read the 
data from local temporary files, this could avoid OOM.

> BlockManager does not account for memory consumed by remote fetches
> ---
>
> Key: SPARK-22062
> URL: https://issues.apache.org/jira/browse/SPARK-22062
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 2.2.0
>Reporter: Sergei Lebedev
>Priority: Minor
>
> We use Spark exclusively with {{StorageLevel.DiskOnly}} as our workloads are 
> very sensitive to memory usage. Recently, we've spotted that the jobs 
> sometimes OOM leaving lots of byte[] arrays on the heap. Upon further 
> investigation, we've found that the arrays come from 
> {{BlockManager.getRemoteBytes}}, which 
> [calls|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L638]
>  {{BlockTransferService.fetchBlockSync}}, which in its turn would 
> [allocate|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala#L99]
>  an on-heap {{ByteBuffer}} of the same size as the block (e.g. full 
> partition), if the block was successfully retrieved over the network.
> This memory is not accounted towards Spark storage/execution memory and could 
> potentially lead to OOM if {{BlockManager}} fetches too many partitions in 
> parallel. I wonder if this is intentional behaviour, or in fact a bug?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22243) streaming job failed to restart from checkpoint

2017-10-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200372#comment-16200372
 ] 

Saisai Shao commented on SPARK-22243:
-

Yes, it is a related issue regarding to Spark Streaming checkpoint recovery, 
the configurations should be updated, rather than keeping the old reference.

> streaming job failed to restart from checkpoint
> ---
>
> Key: SPARK-22243
> URL: https://issues.apache.org/jira/browse/SPARK-22243
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.0, 2.2.0
>Reporter: StephenZou
> Attachments: CheckpointTest.scala
>
>
> My spark-defaults.conf has an item related to the issue, I upload all jars in 
> spark's jars folder to the hdfs path:
> spark.yarn.jars  hdfs:///spark/cache/spark2.2/* 
> Streaming job failed to restart from checkpoint, ApplicationMaster throws  
> "Error: Could not find or load main class 
> org.apache.spark.deploy.yarn.ExecutorLauncher".  The problem is always 
> reproducible.
> I examine the sparkconf object recovered from checkpoint, and find 
> spark.yarn.jars are set empty, which let all jars not exist in AM side. The 
> solution is spark.yarn.jars should be reload from properties files when 
> recovering from checkpoint. 
> attach is a demo to reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21737) Create communication channel between arbitrary clients and the Spark AM in YARN mode

2017-10-10 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198674#comment-16198674
 ] 

Saisai Shao commented on SPARK-21737:
-

I was trying to understand how Spark communicate with Mesos, but my knowledge 
of Mesos is quite poor, till now I cannot figure out a way to address this 
problem in Mesos.

If we only restrict this problem to a Spark on YARN problem, then this is not 
an issue any more. But my thinking is that only focusing on YARN makes this 
channel not so useful. 

Another way is that we build an interface to get endpoint address from cluster 
manager, and only have a on-yarn implementation initially. Later on we can 
support Standalone and Mesos if it is required, what do you think?

> Create communication channel between arbitrary clients and the Spark AM in 
> YARN mode
> 
>
> Key: SPARK-21737
> URL: https://issues.apache.org/jira/browse/SPARK-21737
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Jong Yoon Lee
>Priority: Minor
>
> In this JIRA, I develop code to create a communication channel between 
> arbitrary clients and a Spark AM on YARN. This code can be utilized to send 
> commands such as getting status command, getting history info from the CLI, 
> killing the application and pushing new tokens.
> Design Doc:
> https://docs.google.com/document/d/1QMbWhg13ocIoADywZQBRRVj-b9Zf8CnBrruP5JhcOOY/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (LIVY-7) Add autocompletion API

2017-10-10 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-7:
--

Assignee: Pascal Pellmont

> Add autocompletion API
> --
>
> Key: LIVY-7
> URL: https://issues.apache.org/jira/browse/LIVY-7
> Project: Livy
>  Issue Type: New Feature
>  Components: Interpreter
>Affects Versions: 0.1
>Reporter: Erick Tryzelaar
>Assignee: Pascal Pellmont
> Fix For: 0.5.0
>
>
> Add an ipython-esque autocomplete api to livy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SPARK-22199) Spark Job on YARN fails with executors "Slave registration failed"

2017-10-10 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198258#comment-16198258
 ] 

Saisai Shao commented on SPARK-22199:
-

Can you please list the steps to reproduce this issue? Also please try with 
latest master branch to see if the issue still exists.

> Spark Job on YARN fails with executors "Slave registration failed"
> --
>
> Key: SPARK-22199
> URL: https://issues.apache.org/jira/browse/SPARK-22199
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.3
>Reporter: Prabhu Joseph
>Priority: Minor
>
> Spark Job on YARN Failed with max executors Failed.
> ApplicationMaster logs:
> {code}
> 17/09/28 04:18:27 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with FAILED (diag message: Max number of executor failures (3) reached)
> {code}
> Checking the failed container logs shows "Slave registration failed: 
> Duplicate executor ID" whereas the Driver logs shows it has removed those 
> executors as they are idle for spark.dynamicAllocation.executorIdleTimeout
> Executor Logs:
> {code}
> 17/09/28 04:18:26 ERROR CoarseGrainedExecutorBackend: Slave registration 
> failed: Duplicate executor ID: 122
> {code}
> Driver logs:
> {code}
> 17/09/28 04:18:21 INFO ExecutorAllocationManager: Removing executor 122 
> because it has been idle for 60 seconds (new desired total will be 133)
> {code}
> There are two issues here:
> 1. Error Message in executor is misleading "Slave registration failed: 
> Duplicate executor ID"  as the actual error is it was idle
> 2. The job failed as there are executors idle for 
> spark.dynamicAllocation.executorIdleTimeout
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21737) Create communication channel between arbitrary clients and the Spark AM in YARN mode

2017-10-09 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198204#comment-16198204
 ] 

Saisai Shao commented on SPARK-21737:
-

Hi [~tgraves], I'm trying to understand the design of this. As discussed in PR 
we planned to change to create a generic client to driver communication channel 
instead of client-to-AM one. But this raising a question as how to find out RPC 
endpoint? 

In this PR, because it is only targeted to YARN, so it leverages YARN api to 
report AM RPC host/port to RM, and client could get AM RPC endpoint address 
from RM requesting. But if we're going to build a generic client-to-driver 
channel, then how to figure out driver RPC endpoint address?

If we only restrict our topic to Spark on YARN area, then same solution can be 
used to figure out driver RPC address. But how to address this issue when 
running in Standalone/Mesos mode, seems there's no related solution to figure 
out it. I thought of different ways like using hdfs save driver RPC address to 
a file and letting client to read it, but none of them are good enough.

So I'd like to hear your suggestion on it, do you have any better solution? 
Thanks in advance!



 

> Create communication channel between arbitrary clients and the Spark AM in 
> YARN mode
> 
>
> Key: SPARK-21737
> URL: https://issues.apache.org/jira/browse/SPARK-21737
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Jong Yoon Lee
>Priority: Minor
>
> In this JIRA, I develop code to create a communication channel between 
> arbitrary clients and a Spark AM on YARN. This code can be utilized to send 
> commands such as getting status command, getting history info from the CLI, 
> killing the application and pushing new tokens.
> Design Doc:
> https://docs.google.com/document/d/1QMbWhg13ocIoADywZQBRRVj-b9Zf8CnBrruP5JhcOOY/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Spark cassandra connector with livy

2017-10-09 Thread Saisai Shao
Please set "spark.jars.packages" to the package you wanted in batch POST
protocol "conf" field.

Thanks
Jerry

On Tue, Oct 10, 2017 at 3:05 AM, Junaid Nasir  wrote:

> more info regarding the problem.
> when i pass {"kind": "pyspark","jars":["datastax:
> spark-cassandra-connector:2.0.1-s_2.11"]} via post to /sessions
> session status changes to dead. livy logs show this
>
>   17/10/09 18:45:15 WARN ContextLauncher: Child process 
> exited with code 1.
> 17/10/09 18:45:15 ERROR RSCClient: Failed to connect to context.
> java.io.IOException: Child process exited with code 1.
> at 
> com.cloudera.livy.rsc.ContextLauncher$ChildProcess$1.run(ContextLauncher.java:383)
> at 
> com.cloudera.livy.rsc.ContextLauncher$ChildProcess$2.run(ContextLauncher.java:432)
> at java.lang.Thread.run(Thread.java:748)
> 17/10/09 18:45:15 INFO RSCClient: Failing pending job 
> 115078db-7ec3-4a5c-b74e-e24d6d811413 due to shutdown.
> 17/10/09 18:45:15 WARN InteractiveSession: (Fail to get rsc 
> uri,java.util.concurrent.ExecutionException: java.io.IOException: Child 
> process exited with code 1.)
> 17/10/09 18:45:15 INFO InteractiveSession: Stopping InteractiveSession 1...
> 17/10/09 18:45:15 INFO InteractiveSession: Stopped InteractiveSession 1.
>
> [image: Mixmax]
>  Not using Mixmax yet?
> 
>
>
>
> On Mon, Oct 9, 2017 5:11 PM, Junaid Nasir jna...@an10.io wrote:
>
>> Hi,
>>
>> How can I include Cassandra-connect-package
>>  to Livy spark
>> session?
>>
>> I use the following spark-submit command
>>
>> /spark-submit --master spark://10.128.1.1:7077 --packages
>> datastax:spark-cassandra-connector:2.0.1-s_2.11 --conf
>> spark.cassandra.connection.host="10.128.1.1,10.128.1.2,10.128.1.3"
>>
>>
>> how can i translate it to livy's session request or add this to livy conf
>> file?
>>
>> Also how to provide list of string and Map of key=val as mentioned in livy's
>> documentation
>> 
>>
>> Regards,
>> Junaid Nasir
>>
>


[jira] [Assigned] (SPARK-22074) Task killed by other attempt task should not be resubmitted

2017-10-09 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-22074:
---

Assignee: Li Yuanjian

> Task killed by other attempt task should not be resubmitted
> ---
>
> Key: SPARK-22074
> URL: https://issues.apache.org/jira/browse/SPARK-22074
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Li Yuanjian
>Assignee: Li Yuanjian
>  Labels: speculation
> Fix For: 2.3.0
>
>
> When a task killed by other task attempt, the task still resubmitted while 
> its executor lost. There is a certain probability caused the stage hanging 
> forever because of the unnecessary resubmit(see the scenario description 
> below). Although the patch https://issues.apache.org/jira/browse/SPARK-13931 
> can resolve the hanging problem(thx [~GavinGavinNo1] :) ), but the 
> unnecessary resubmit should abandon.
> Detail scenario description:
> 1. A ShuffleMapStage has many tasks, some of them finished successfully
> 2. An Executor Lost happened, this will trigger a new TaskSet resubmitted, 
> includes all missing partitions.
> 3. Before the resubmitted TaskSet completed, another executor which only 
> include the task killed by other attempt lost, trigger the Resubmitted Event, 
> current stage's pendingPartitions is not empty.
> 4. Resubmitted TaskSet end, shuffleMapStage.isAvailable == true, but 
> pendingPartitions is not empty, never step into submitWaitingChildStages.
> Leave the key logs of this scenario below:
> {noformat}
> 393332:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 120 missing tasks from ShuffleMapStage 1046 
> (MapPartitionsRDD[5321] at rdd at AFDEntry.scala:116)
> 39:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.0 with 120 tasks
> 408766:17/09/11 13:46:25 [dispatcher-event-loop-5] INFO TaskSetManager: 
> Starting task 66.0 in stage 1046.0 (TID 110761, hidden-baidu-host.baidu.com, 
> executor 15, partition 66, PROCESS_LOCAL, 6237 bytes)
> [1] Executor 15 lost, task 66.0 and 90.0 on it
> 410532:17/09/11 13:46:32 [dispatcher-event-loop-47] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 15.
> 410900:17/09/11 13:46:33 [dispatcher-event-loop-34] INFO TaskSetManager: 
> Starting task 66.1 in stage 1046.0 (TID 111400, hidden-baidu-host.baidu.com, 
> executor 70, partition 66, PROCESS_LOCAL, 6237 bytes)
> [2] Task 66.0 killed by 66.1
> 411315:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Killing 
> attempt 0 for task 66.0 in stage 1046.0 (TID 110761) on 
> hidden-baidu-host.baidu.com as the attempt 1 succeeded on 
> hidden-baidu-host.baidu.com
> 411316:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Finished 
> task 66.1 in stage 1046.0 (TID 111400) in 3545 ms on 
> hidden-baidu-host.baidu.com (executor 70) (115/120)
> [3] Executor 7 lost, task 0.0 72.0 7.0 on it
> 411390:17/09/11 13:46:37 [dispatcher-event-loop-24] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 7.
> 416014:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) finished in 94.577 s
> [4] ShuffleMapStage 1046.0 finished, missing partition trigger resubmitted 
> 1046.1
> 416019:17/09/1 13:46:59 [dag-scheduler-event- oop] INFO DAGScheduler: 
> Resubmitting ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) because some of 
> its tasks had failed: 0, 72, 79
> 416020:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting ShuffleMapStage 1046 (MapPartitionsRDD[5321] at rdd at 
> AFDEntry.scala:116), which has no missing parents
> 416030:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 3 missing tasks from ShuffleMapStage 1046 (MapPartitionsRDD[5321] 
> at rdd at AFDEntry.scala:116)
> 416032:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.1 with 3 tasks
> 416034:17/09/11 13:46:59 [dispatcher-event-loop-21] INFO TaskSetManager: 
> Starting task 0.0 in stage 1046.1 (TID 112788, hidden-baidu-host.baidu.com, 
> executor 37, partition 0, PROCESS_LOCAL, 6237 bytes)
> 416037:17/09/11 13:46:59 [dispatcher-event-loop-23] INFO TaskSetManager: 
> Starting task 1.0 in stage 1046.1 (TID 112789, 
> yq01-inf-nmg01-spark03-20160817113538.yq01.baidu.com, executor 69, partition 
> 72, PROCESS_LOCAL, 6237 bytes)
> 416039:17/09/11 13:46:59 [dispatcher-event-loop-23] INFO T

[jira] [Resolved] (SPARK-22074) Task killed by other attempt task should not be resubmitted

2017-10-09 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-22074.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19287
[https://github.com/apache/spark/pull/19287]

> Task killed by other attempt task should not be resubmitted
> ---
>
> Key: SPARK-22074
> URL: https://issues.apache.org/jira/browse/SPARK-22074
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Li Yuanjian
>  Labels: speculation
> Fix For: 2.3.0
>
>
> When a task killed by other task attempt, the task still resubmitted while 
> its executor lost. There is a certain probability caused the stage hanging 
> forever because of the unnecessary resubmit(see the scenario description 
> below). Although the patch https://issues.apache.org/jira/browse/SPARK-13931 
> can resolve the hanging problem(thx [~GavinGavinNo1] :) ), but the 
> unnecessary resubmit should abandon.
> Detail scenario description:
> 1. A ShuffleMapStage has many tasks, some of them finished successfully
> 2. An Executor Lost happened, this will trigger a new TaskSet resubmitted, 
> includes all missing partitions.
> 3. Before the resubmitted TaskSet completed, another executor which only 
> include the task killed by other attempt lost, trigger the Resubmitted Event, 
> current stage's pendingPartitions is not empty.
> 4. Resubmitted TaskSet end, shuffleMapStage.isAvailable == true, but 
> pendingPartitions is not empty, never step into submitWaitingChildStages.
> Leave the key logs of this scenario below:
> {noformat}
> 393332:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 120 missing tasks from ShuffleMapStage 1046 
> (MapPartitionsRDD[5321] at rdd at AFDEntry.scala:116)
> 39:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.0 with 120 tasks
> 408766:17/09/11 13:46:25 [dispatcher-event-loop-5] INFO TaskSetManager: 
> Starting task 66.0 in stage 1046.0 (TID 110761, hidden-baidu-host.baidu.com, 
> executor 15, partition 66, PROCESS_LOCAL, 6237 bytes)
> [1] Executor 15 lost, task 66.0 and 90.0 on it
> 410532:17/09/11 13:46:32 [dispatcher-event-loop-47] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 15.
> 410900:17/09/11 13:46:33 [dispatcher-event-loop-34] INFO TaskSetManager: 
> Starting task 66.1 in stage 1046.0 (TID 111400, hidden-baidu-host.baidu.com, 
> executor 70, partition 66, PROCESS_LOCAL, 6237 bytes)
> [2] Task 66.0 killed by 66.1
> 411315:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Killing 
> attempt 0 for task 66.0 in stage 1046.0 (TID 110761) on 
> hidden-baidu-host.baidu.com as the attempt 1 succeeded on 
> hidden-baidu-host.baidu.com
> 411316:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Finished 
> task 66.1 in stage 1046.0 (TID 111400) in 3545 ms on 
> hidden-baidu-host.baidu.com (executor 70) (115/120)
> [3] Executor 7 lost, task 0.0 72.0 7.0 on it
> 411390:17/09/11 13:46:37 [dispatcher-event-loop-24] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 7.
> 416014:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) finished in 94.577 s
> [4] ShuffleMapStage 1046.0 finished, missing partition trigger resubmitted 
> 1046.1
> 416019:17/09/1 13:46:59 [dag-scheduler-event- oop] INFO DAGScheduler: 
> Resubmitting ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) because some of 
> its tasks had failed: 0, 72, 79
> 416020:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting ShuffleMapStage 1046 (MapPartitionsRDD[5321] at rdd at 
> AFDEntry.scala:116), which has no missing parents
> 416030:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 3 missing tasks from ShuffleMapStage 1046 (MapPartitionsRDD[5321] 
> at rdd at AFDEntry.scala:116)
> 416032:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.1 with 3 tasks
> 416034:17/09/11 13:46:59 [dispatcher-event-loop-21] INFO TaskSetManager: 
> Starting task 0.0 in stage 1046.1 (TID 112788, hidden-baidu-host.baidu.com, 
> executor 37, partition 0, PROCESS_LOCAL, 6237 bytes)
> 416037:17/09/11 13:46:59 [dispatcher-event-loop-23] INFO TaskSetManager: 
> Starting task 1.0 in stage 1046.1 (TID 112789, 
> yq01-inf-nmg01-spark03-20160817113538.yq01.baidu.com, executor 69, partition 
> 72, PROCESS_LOCAL, 6237 bytes

[jira] [Assigned] (SPARK-22135) metrics in spark-dispatcher not being registered properly

2017-09-28 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-22135:
---

Assignee: paul mackles

> metrics in spark-dispatcher not being registered properly
> -
>
> Key: SPARK-22135
> URL: https://issues.apache.org/jira/browse/SPARK-22135
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Affects Versions: 2.1.0, 2.2.0
>Reporter: paul mackles
>Assignee: paul mackles
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> There is a bug in the way that the metrics in 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are 
> initialized such that they are never registered with the underlying registry. 
> Basically, each call to the overridden "metricRegistry" function results in 
> the creation of a new registry. PR is forthcoming.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22135) metrics in spark-dispatcher not being registered properly

2017-09-28 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-22135.
-
   Resolution: Fixed
Fix Version/s: 2.3.0
   2.2.1

Issue resolved by pull request 19358
[https://github.com/apache/spark/pull/19358]

> metrics in spark-dispatcher not being registered properly
> -
>
> Key: SPARK-22135
> URL: https://issues.apache.org/jira/browse/SPARK-22135
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Affects Versions: 2.1.0, 2.2.0
>Reporter: paul mackles
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> There is a bug in the way that the metrics in 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource are 
> initialized such that they are never registered with the underlying registry. 
> Basically, each call to the overridden "metricRegistry" function results in 
> the creation of a new registry. PR is forthcoming.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (LIVY-408) Upgrade Netty version to avoid some security issues

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-408:
-
Fix Version/s: (was: 0.5.0)

> Upgrade Netty version to avoid some security issues
> ---
>
> Key: LIVY-408
> URL: https://issues.apache.org/jira/browse/LIVY-408
> Project: Livy
>  Issue Type: Improvement
>  Components: RSC
>Affects Versions: 0.4.1, 0.5.0
>    Reporter: Saisai Shao
>Assignee: Saisai Shao
>
> Netty version below 4.0.37.Final has some potential security issue 
> (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4970) which is fixed 
> in this version. So we should upgrade Livy to avoid such issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (LIVY-408) Upgrade Netty version to avoid some security issues

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-408:
-
Priority: Minor  (was: Major)

> Upgrade Netty version to avoid some security issues
> ---
>
> Key: LIVY-408
> URL: https://issues.apache.org/jira/browse/LIVY-408
> Project: Livy
>  Issue Type: Improvement
>  Components: RSC
>Affects Versions: 0.4.1, 0.5.0
>    Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
>
> Netty version below 4.0.37.Final has some potential security issue 
> (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4970) which is fixed 
> in this version. So we should upgrade Livy to avoid such issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (LIVY-408) Upgrade Netty version to avoid some security issues

2017-09-27 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-408:


 Summary: Upgrade Netty version to avoid some security issues
 Key: LIVY-408
 URL: https://issues.apache.org/jira/browse/LIVY-408
 Project: Livy
  Issue Type: Improvement
  Components: RSC
Reporter: Saisai Shao
Assignee: Saisai Shao
 Fix For: 0.5.0


Netty version below 4.0.37.Final has some potential security issue 
(https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-4970) which is fixed 
in this version. So we should upgrade Livy to avoid such issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SPARK-22151) PYTHONPATH not picked up from the spark.yarn.appMasterEnv properly

2017-09-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183581#comment-16183581
 ] 

Saisai Shao commented on SPARK-22151:
-

Checking the Yarn client code, looks like there's specific code to handle 
{{PYTHONPATH}}, but not for {{spark.yarn.appMasterEnv.PYTHONPATH}}.

> PYTHONPATH not picked up from the spark.yarn.appMasterEnv properly
> --
>
> Key: SPARK-22151
> URL: https://issues.apache.org/jira/browse/SPARK-22151
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: Thomas Graves
>
> the code looks at the env variables:
> val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
> But when you set spark.yarn.appMasterEnv it puts it into the local env. 
> So the python path set in spark.yarn.appMasterEnv isn't properly set.
> You can work around if you are running in cluster mode by setting it on the 
> client like:
> PYTHONPATH=./addon/python/ spark-submit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21737) Create communication channel between arbitrary clients and the Spark AM in YARN mode

2017-09-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183563#comment-16183563
 ] 

Saisai Shao commented on SPARK-21737:
-

[~yoonlee95], are you still working on this thing? I think this JIRA should 
still be valid.

> Create communication channel between arbitrary clients and the Spark AM in 
> YARN mode
> 
>
> Key: SPARK-21737
> URL: https://issues.apache.org/jira/browse/SPARK-21737
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Jong Yoon Lee
>Priority: Minor
>
> In this JIRA, I develop code to create a communication channel between 
> arbitrary clients and a Spark AM on YARN. This code can be utilized to send 
> commands such as getting status command, getting history info from the CLI, 
> killing the application and pushing new tokens.
> Design Doc:
> https://docs.google.com/document/d/1QMbWhg13ocIoADywZQBRRVj-b9Zf8CnBrruP5JhcOOY/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22123) Add latest failure reason for task set blacklist

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-22123:

Priority: Minor  (was: Major)

> Add latest failure reason for task set blacklist
> 
>
> Key: SPARK-22123
> URL: https://issues.apache.org/jira/browse/SPARK-22123
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Minor
> Fix For: 2.3.0
>
>
> Till now , every job which aborted by completed blacklist just show log like 
> below which has no more information:
> {code:java}
> Aborting $taskSet because task $indexInTaskSet (partition $partition) cannot 
> run anywhere due to node and executor blacklist. Blacklisting behavior cannot 
> run anywhere due to node and executor blacklist.Blacklisting behavior can be 
> configured via spark.blacklist.*."
> {code}
> We could add most recent failure reason for taskset blacklist which can be 
> showed on spark ui to let user know failure reason directly.
> An example after modifying:
> {code:java}
> Aborting TaskSet 0.0 because task 0 (partition 0) cannot run anywhere due to 
> node and executor blacklist.
>  Most recent failure:
>  Some(Lost task 0.1 in stage 0.0 (TID 3,xxx, executor 1): 
> java.lang.Exception: Fake error!
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:73)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:305)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
>  ). 
> Blacklisting behavior can be configured via spark.blacklist.*.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22123) Add latest failure reason for task set blacklist

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-22123:
---

Assignee: zhoukang

> Add latest failure reason for task set blacklist
> 
>
> Key: SPARK-22123
> URL: https://issues.apache.org/jira/browse/SPARK-22123
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: zhoukang
>Assignee: zhoukang
> Fix For: 2.3.0
>
>
> Till now , every job which aborted by completed blacklist just show log like 
> below which has no more information:
> {code:java}
> Aborting $taskSet because task $indexInTaskSet (partition $partition) cannot 
> run anywhere due to node and executor blacklist. Blacklisting behavior cannot 
> run anywhere due to node and executor blacklist.Blacklisting behavior can be 
> configured via spark.blacklist.*."
> {code}
> We could add most recent failure reason for taskset blacklist which can be 
> showed on spark ui to let user know failure reason directly.
> An example after modifying:
> {code:java}
> Aborting TaskSet 0.0 because task 0 (partition 0) cannot run anywhere due to 
> node and executor blacklist.
>  Most recent failure:
>  Some(Lost task 0.1 in stage 0.0 (TID 3,xxx, executor 1): 
> java.lang.Exception: Fake error!
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:73)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:305)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
>  ). 
> Blacklisting behavior can be configured via spark.blacklist.*.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22123) Add latest failure reason for task set blacklist

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-22123.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19338
[https://github.com/apache/spark/pull/19338]

> Add latest failure reason for task set blacklist
> 
>
> Key: SPARK-22123
> URL: https://issues.apache.org/jira/browse/SPARK-22123
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: zhoukang
> Fix For: 2.3.0
>
>
> Till now , every job which aborted by completed blacklist just show log like 
> below which has no more information:
> {code:java}
> Aborting $taskSet because task $indexInTaskSet (partition $partition) cannot 
> run anywhere due to node and executor blacklist. Blacklisting behavior cannot 
> run anywhere due to node and executor blacklist.Blacklisting behavior can be 
> configured via spark.blacklist.*."
> {code}
> We could add most recent failure reason for taskset blacklist which can be 
> showed on spark ui to let user know failure reason directly.
> An example after modifying:
> {code:java}
> Aborting TaskSet 0.0 because task 0 (partition 0) cannot run anywhere due to 
> node and executor blacklist.
>  Most recent failure:
>  Some(Lost task 0.1 in stage 0.0 (TID 3,xxx, executor 1): 
> java.lang.Exception: Fake error!
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:73)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:305)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
>  ). 
> Blacklisting behavior can be configured via spark.blacklist.*.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20785) Spark should provide jump links and add (count) in the SQL web ui.

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-20785.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19346
[https://github.com/apache/spark/pull/19346]

> Spark should  provide jump links and add (count) in the SQL web ui.
> ---
>
> Key: SPARK-20785
> URL: https://issues.apache.org/jira/browse/SPARK-20785
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: guoxiaolongzte
>Priority: Minor
> Fix For: 2.3.0
>
>
> it provide links that jump to Running Queries,Completed Queries and Failed 
> Queries.
> it  add (count) about Running Queries,Completed Queries and Failed Queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20785) Spark should provide jump links and add (count) in the SQL web ui.

2017-09-27 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-20785:
---

Assignee: guoxiaolongzte

> Spark should  provide jump links and add (count) in the SQL web ui.
> ---
>
> Key: SPARK-20785
> URL: https://issues.apache.org/jira/browse/SPARK-20785
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: guoxiaolongzte
>Assignee: guoxiaolongzte
>Priority: Minor
> Fix For: 2.3.0
>
>
> it provide links that jump to Running Queries,Completed Queries and Failed 
> Queries.
> it  add (count) about Running Queries,Completed Queries and Failed Queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22074) Task killed by other attempt task should not be resubmitted

2017-09-27 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182062#comment-16182062
 ] 

Saisai Shao commented on SPARK-22074:
-

So if I understand correctly, this happens when speculation is happened, if one 
task attempt is finished (66.1), it will try to kill all other attempts (66.0), 
but before this attempt (66.0) is fully killed, the executor who run this 
attempt is lost, so scheduler will resubmit this attempt because of executor 
lost, and neglect other successful attempt, Am I right?



> Task killed by other attempt task should not be resubmitted
> ---
>
> Key: SPARK-22074
> URL: https://issues.apache.org/jira/browse/SPARK-22074
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Li Yuanjian
>
> When a task killed by other task attempt, the task still resubmitted while 
> its executor lost. There is a certain probability caused the stage hanging 
> forever because of the unnecessary resubmit(see the scenario description 
> below). Although the patch https://issues.apache.org/jira/browse/SPARK-13931 
> can resolve the hanging problem(thx [~GavinGavinNo1] :) ), but the 
> unnecessary resubmit should abandon.
> Detail scenario description:
> 1. A ShuffleMapStage has many tasks, some of them finished successfully
> 2. An Executor Lost happened, this will trigger a new TaskSet resubmitted, 
> includes all missing partitions.
> 3. Before the resubmitted TaskSet completed, another executor which only 
> include the task killed by other attempt lost, trigger the Resubmitted Event, 
> current stage's pendingPartitions is not empty.
> 4. Resubmitted TaskSet end, shuffleMapStage.isAvailable == true, but 
> pendingPartitions is not empty, never step into submitWaitingChildStages.
> Leave the key logs of this scenario below:
> {noformat}
> 393332:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 120 missing tasks from ShuffleMapStage 1046 
> (MapPartitionsRDD[5321] at rdd at AFDEntry.scala:116)
> 39:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.0 with 120 tasks
> 408766:17/09/11 13:46:25 [dispatcher-event-loop-5] INFO TaskSetManager: 
> Starting task 66.0 in stage 1046.0 (TID 110761, hidden-baidu-host.baidu.com, 
> executor 15, partition 66, PROCESS_LOCAL, 6237 bytes)
> [1] Executor 15 lost, task 66.0 and 90.0 on it
> 410532:17/09/11 13:46:32 [dispatcher-event-loop-47] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 15.
> 410900:17/09/11 13:46:33 [dispatcher-event-loop-34] INFO TaskSetManager: 
> Starting task 66.1 in stage 1046.0 (TID 111400, hidden-baidu-host.baidu.com, 
> executor 70, partition 66, PROCESS_LOCAL, 6237 bytes)
> [2] Task 66.0 killed by 66.1
> 411315:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Killing 
> attempt 0 for task 66.0 in stage 1046.0 (TID 110761) on 
> hidden-baidu-host.baidu.com as the attempt 1 succeeded on 
> hidden-baidu-host.baidu.com
> 411316:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Finished 
> task 66.1 in stage 1046.0 (TID 111400) in 3545 ms on 
> hidden-baidu-host.baidu.com (executor 70) (115/120)
> [3] Executor 7 lost, task 0.0 72.0 7.0 on it
> 411390:17/09/11 13:46:37 [dispatcher-event-loop-24] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 7.
> 416014:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) finished in 94.577 s
> [4] ShuffleMapStage 1046.0 finished, missing partition trigger resubmitted 
> 1046.1
> 416019:17/09/1 13:46:59 [dag-scheduler-event- oop] INFO DAGScheduler: 
> Resubmitting ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) because some of 
> its tasks had failed: 0, 72, 79
> 416020:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting ShuffleMapStage 1046 (MapPartitionsRDD[5321] at rdd at 
> AFDEntry.scala:116), which has no missing parents
> 416030:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 3 missing tasks from ShuffleMapStage 1046 (MapPartitionsRDD[5321] 
> at rdd at AFDEntry.scala:116)
> 416032:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.1 with 3 tasks
> 416034:17/09/11 13:46:59 [dispatcher-event-loop-21] INFO TaskSetManager: 
> Starting task 0.0 in stage 1046.1 (TID 112788, hidden-baidu-host.baidu.com, 
> executor 37, partition 0, PROCESS_LOCAL, 6237 bytes)
> 416037:17/09/11 13:46:59 [dispatcher-event-loop

[jira] [Commented] (SPARK-22074) Task killed by other attempt task should not be resubmitted

2017-09-26 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181924#comment-16181924
 ] 

Saisai Shao commented on SPARK-22074:
-

Hey [~XuanYuan], I'm a little confused why there will be a resubmit event after 
66.0 is killed, since this killing action is expected and Spark should not 
launch another attempt.

> Task killed by other attempt task should not be resubmitted
> ---
>
> Key: SPARK-22074
> URL: https://issues.apache.org/jira/browse/SPARK-22074
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Li Yuanjian
>
> When a task killed by other task attempt, the task still resubmitted while 
> its executor lost. There is a certain probability caused the stage hanging 
> forever because of the unnecessary resubmit(see the scenario description 
> below). Although the patch https://issues.apache.org/jira/browse/SPARK-13931 
> can resolve the hanging problem(thx [~GavinGavinNo1] :) ), but the 
> unnecessary resubmit should abandon.
> Detail scenario description:
> 1. A ShuffleMapStage has many tasks, some of them finished successfully
> 2. An Executor Lost happened, this will trigger a new TaskSet resubmitted, 
> includes all missing partitions.
> 3. Before the resubmitted TaskSet completed, another executor which only 
> include the task killed by other attempt lost, trigger the Resubmitted Event, 
> current stage's pendingPartitions is not empty.
> 4. Resubmitted TaskSet end, shuffleMapStage.isAvailable == true, but 
> pendingPartitions is not empty, never step into submitWaitingChildStages.
> Leave the key logs of this scenario below:
> {noformat}
> 393332:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 120 missing tasks from ShuffleMapStage 1046 
> (MapPartitionsRDD[5321] at rdd at AFDEntry.scala:116)
> 39:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.0 with 120 tasks
> 408766:17/09/11 13:46:25 [dispatcher-event-loop-5] INFO TaskSetManager: 
> Starting task 66.0 in stage 1046.0 (TID 110761, hidden-baidu-host.baidu.com, 
> executor 15, partition 66, PROCESS_LOCAL, 6237 bytes)
> [1] Executor 15 lost, task 66.0 and 90.0 on it
> 410532:17/09/11 13:46:32 [dispatcher-event-loop-47] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 15.
> 410900:17/09/11 13:46:33 [dispatcher-event-loop-34] INFO TaskSetManager: 
> Starting task 66.1 in stage 1046.0 (TID 111400, hidden-baidu-host.baidu.com, 
> executor 70, partition 66, PROCESS_LOCAL, 6237 bytes)
> [2] Task 66.0 killed by 66.1
> 411315:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Killing 
> attempt 0 for task 66.0 in stage 1046.0 (TID 110761) on 
> hidden-baidu-host.baidu.com as the attempt 1 succeeded on 
> hidden-baidu-host.baidu.com
> 411316:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Finished 
> task 66.1 in stage 1046.0 (TID 111400) in 3545 ms on 
> hidden-baidu-host.baidu.com (executor 70) (115/120)
> [3] Executor 7 lost, task 0.0 72.0 7.0 on it
> 411390:17/09/11 13:46:37 [dispatcher-event-loop-24] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 7.
> 416014:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) finished in 94.577 s
> [4] ShuffleMapStage 1046.0 finished, missing partition trigger resubmitted 
> 1046.1
> 416019:17/09/1 13:46:59 [dag-scheduler-event- oop] INFO DAGScheduler: 
> Resubmitting ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) because some of 
> its tasks had failed: 0, 72, 79
> 416020:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting ShuffleMapStage 1046 (MapPartitionsRDD[5321] at rdd at 
> AFDEntry.scala:116), which has no missing parents
> 416030:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 3 missing tasks from ShuffleMapStage 1046 (MapPartitionsRDD[5321] 
> at rdd at AFDEntry.scala:116)
> 416032:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.1 with 3 tasks
> 416034:17/09/11 13:46:59 [dispatcher-event-loop-21] INFO TaskSetManager: 
> Starting task 0.0 in stage 1046.1 (TID 112788, hidden-baidu-host.baidu.com, 
> executor 37, partition 0, PROCESS_LOCAL, 6237 bytes)
> 416037:17/09/11 13:46:59 [dispatcher-event-loop-23] INFO TaskSetManager: 
> Starting task 1.0 in stage 1046.1 (TID 112789, 
> yq01-inf-nmg01-spark03-20160817113538.yq01.baidu.com, executor 69, partition 
> 72, PROCESS_LOCAL, 62

[jira] [Commented] (SPARK-9103) Tracking spark's memory usage

2017-09-26 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181905#comment-16181905
 ] 

Saisai Shao commented on SPARK-9103:


Hi [~irashid], thanks a lot for your response.

I agree that your concern is very valid, especially on how to correlate the 
whole memory usage to the task execution. But somehow it is hard to do from the 
task level based on current Spark's design, in which some memory usage is 
shared between tasks, like Netty memory, storage and execution memory. Also 
about user memory, I think it is a missing part in the current Spark, but to 
know this part of memory seems quite expensive, since we cannot expect what 
will user do in the task, like memory used by 3rd party lib. 

So let me think a bit on how to further extend this feature (though looks a 
little difficult to do) :).

> Tracking spark's memory usage
> -
>
> Key: SPARK-9103
> URL: https://issues.apache.org/jira/browse/SPARK-9103
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Reporter: Zhang, Liye
> Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on 
> webUI) for the executors. User have no idea on what is the memory consumption 
> when they are running spark applications with a lot of memory used in spark 
> executors. Especially when they encounter the OOM, it’s really hard to know 
> what is the cause of the problem. So it would be helpful to give out the 
> detail memory consumption information for each part of spark, so that user 
> can clearly have a picture of where the memory is exactly used. 
> The memory usage info to expose should include but not limited to shuffle, 
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly 
> for debugging and tuning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22074) Task killed by other attempt task should not be resubmitted

2017-09-26 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180506#comment-16180506
 ] 

Saisai Shao commented on SPARK-22074:
-

Hi [~XuanYuan], can you please help me to understand your scenario, is it 
happened only when task attempt (66.0) is lost (which will be adding to pending 
list), at this time another attempt (66.1) is finished, it will try to kill 
66.0, but because 66.0 is pending for resubmitting, so it is not truly killed,  
so attempt 66.0 is lingering in the stage 1046.0, which makes 1046 fail to 
finish, do I understand right?

Can you please explain more if my assumption is wrong.

> Task killed by other attempt task should not be resubmitted
> ---
>
> Key: SPARK-22074
> URL: https://issues.apache.org/jira/browse/SPARK-22074
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Li Yuanjian
>
> When a task killed by other task attempt, the task still resubmitted while 
> its executor lost. There is a certain probability caused the stage hanging 
> forever because of the unnecessary resubmit(see the scenario description 
> below). Although the patch https://issues.apache.org/jira/browse/SPARK-13931 
> can resolve the hanging problem(thx [~GavinGavinNo1] :) ), but the 
> unnecessary resubmit should abandon.
> Detail scenario description:
> 1. A ShuffleMapStage has many tasks, some of them finished successfully
> 2. An Executor Lost happened, this will trigger a new TaskSet resubmitted, 
> includes all missing partitions.
> 3. Before the resubmitted TaskSet completed, another executor which only 
> include the task killed by other attempt lost, trigger the Resubmitted Event, 
> current stage's pendingPartitions is not empty.
> 4. Resubmitted TaskSet end, shuffleMapStage.isAvailable == true, but 
> pendingPartitions is not empty, never step into submitWaitingChildStages.
> Leave the key logs of this scenario below:
> {noformat}
> 393332:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 120 missing tasks from ShuffleMapStage 1046 
> (MapPartitionsRDD[5321] at rdd at AFDEntry.scala:116)
> 39:17/09/11 13:45:24 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.0 with 120 tasks
> 408766:17/09/11 13:46:25 [dispatcher-event-loop-5] INFO TaskSetManager: 
> Starting task 66.0 in stage 1046.0 (TID 110761, hidden-baidu-host.baidu.com, 
> executor 15, partition 66, PROCESS_LOCAL, 6237 bytes)
> [1] Executor 15 lost, task 66.0 and 90.0 on it
> 410532:17/09/11 13:46:32 [dispatcher-event-loop-47] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 15.
> 410900:17/09/11 13:46:33 [dispatcher-event-loop-34] INFO TaskSetManager: 
> Starting task 66.1 in stage 1046.0 (TID 111400, hidden-baidu-host.baidu.com, 
> executor 70, partition 66, PROCESS_LOCAL, 6237 bytes)
> [2] Task 66.0 killed by 66.1
> 411315:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Killing 
> attempt 0 for task 66.0 in stage 1046.0 (TID 110761) on 
> hidden-baidu-host.baidu.com as the attempt 1 succeeded on 
> hidden-baidu-host.baidu.com
> 411316:17/09/11 13:46:37 [task-result-getter-2] INFO TaskSetManager: Finished 
> task 66.1 in stage 1046.0 (TID 111400) in 3545 ms on 
> hidden-baidu-host.baidu.com (executor 70) (115/120)
> [3] Executor 7 lost, task 0.0 72.0 7.0 on it
> 411390:17/09/11 13:46:37 [dispatcher-event-loop-24] INFO 
> YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 7.
> 416014:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) finished in 94.577 s
> [4] ShuffleMapStage 1046.0 finished, missing partition trigger resubmitted 
> 1046.1
> 416019:17/09/1 13:46:59 [dag-scheduler-event- oop] INFO DAGScheduler: 
> Resubmitting ShuffleMapStage 1046 (rdd at AFDEntry.scala:116) because some of 
> its tasks had failed: 0, 72, 79
> 416020:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting ShuffleMapStage 1046 (MapPartitionsRDD[5321] at rdd at 
> AFDEntry.scala:116), which has no missing parents
> 416030:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO DAGScheduler: 
> Submitting 3 missing tasks from ShuffleMapStage 1046 (MapPartitionsRDD[5321] 
> at rdd at AFDEntry.scala:116)
> 416032:17/09/11 13:46:59 [dag-scheduler-event-loop] INFO 
> YarnClusterScheduler: Adding task set 1046.1 with 3 tasks
> 416034:17/09/11 13:46:59 [dispatcher-event-loop-21] INFO TaskSetManager: 
> Starting task 0.0 in stage 1046.1 (TID 112788, hidden-baidu-host.baidu.com, 
> executor 3

[jira] [Closed] (LIVY-403) Add getCompletion for Interpreter

2017-09-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao closed LIVY-403.

Resolution: Duplicate

> Add getCompletion for Interpreter
> -
>
> Key: LIVY-403
> URL: https://issues.apache.org/jira/browse/LIVY-403
> Project: Livy
>  Issue Type: New Feature
>  Components: REPL
>Affects Versions: 0.5.0
>Reporter: Jeff Zhang
>
> This is for code completion feature of interpreter. Also need to add rest api.
> \cc [~jerryshao]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SPARK-9103) Tracking spark's memory usage

2017-09-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176073#comment-16176073
 ] 

Saisai Shao commented on SPARK-9103:


[~irashid] I would like to hear your suggestion on displaying Netty memory 
usage on web UI and REST API. 

Currently with SPARK-21934, we already support exposing Netty memory usage via 
MetricsSystem, user could connect metric sink to Graphite or StatsD to get each 
executor's metrics information. But we don't have a centralized place to 
display the Netty memory usage for all the executors. In the previous PR we 
tried to collect back such metrics to driver through heartbeat, and displayed 
them on web UI. This somehow seems useful, but actually it duplicates the 
functionality of MetricsSystem. So I'm clearly sure if such functionality 
(display Netty memory usage on web UI and RESTR API) is worthwhile. Can you 
please help suggesting on it? Thanks a lot.

> Tracking spark's memory usage
> -
>
> Key: SPARK-9103
> URL: https://issues.apache.org/jira/browse/SPARK-9103
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core, Web UI
>Reporter: Zhang, Liye
> Attachments: Tracking Spark Memory Usage - Phase 1.pdf
>
>
> Currently spark only provides little memory usage information (RDD cache on 
> webUI) for the executors. User have no idea on what is the memory consumption 
> when they are running spark applications with a lot of memory used in spark 
> executors. Especially when they encounter the OOM, it’s really hard to know 
> what is the cause of the problem. So it would be helpful to give out the 
> detail memory consumption information for each part of spark, so that user 
> can clearly have a picture of where the memory is exactly used. 
> The memory usage info to expose should include but not limited to shuffle, 
> cache, network, serializer, etc.
> User can optionally choose to open this functionality since this is mainly 
> for debugging and tuning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21934) Expose Netty memory usage via Metrics System

2017-09-20 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-21934:
---

Assignee: Saisai Shao

> Expose Netty memory usage via Metrics System
> 
>
> Key: SPARK-21934
> URL: https://issues.apache.org/jira/browse/SPARK-21934
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 2.3.0
>
>
> This is a follow-up work of SPARK-9104 to expose the Netty memory usage to 
> MetricsSystem. My initial thought is to only expose Shuffle memory usage, 
> since shuffle is a major part of memory usage in network communication 
> compared to RPC, file server, block transfer. 
> If user wants to also expose Netty memory usage for other modules, we could 
> add more metrics later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21934) Expose Netty memory usage via Metrics System

2017-09-20 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-21934.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19160
[https://github.com/apache/spark/pull/19160]

> Expose Netty memory usage via Metrics System
> 
>
> Key: SPARK-21934
> URL: https://issues.apache.org/jira/browse/SPARK-21934
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
> Fix For: 2.3.0
>
>
> This is a follow-up work of SPARK-9104 to expose the Netty memory usage to 
> MetricsSystem. My initial thought is to only expose Shuffle memory usage, 
> since shuffle is a major part of memory usage in network communication 
> compared to RPC, file server, block transfer. 
> If user wants to also expose Netty memory usage for other modules, we could 
> add more metrics later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: How to get the test logs from travis

2017-09-20 Thread Saisai Shao
Thanks Luciano,

I also googled the approach, looks like cat is the most recommended way to
handle this. Another way is to upload to some public storage service.

Best regards,
Jerry


On Thu, Sep 21, 2017 at 11:30 AM, Luciano Resende <luckbr1...@gmail.com>
wrote:

> Take a look at what Zeppelin does, they kind cat a lot of the logs and
> related files which then get appended to the build results.
>
> https://github.com/apache/zeppelin/blob/master/.travis.yml
>
> On Wed, Sep 20, 2017 at 8:22 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > Currently it is quite painful to figure out the cause of failure on
> travis
> > test. Do you know how to get test logs from travis, is there a way
> > supported by travis to either upload environment to some places, or log
> on
> > to travis to dig the files.
> >
> > Previously we uploaded logs to azure when test is failed, I'm not sure it
> > is still worked, shall we figure out a stable way to address this issue?
> >
> > Thanks
> > Jerry
> >
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


How to get the test logs from travis

2017-09-20 Thread Saisai Shao
Hi Team,

Currently it is quite painful to figure out the cause of failure on travis
test. Do you know how to get test logs from travis, is there a way
supported by travis to either upload environment to some places, or log on
to travis to dig the files.

Previously we uploaded logs to azure when test is failed, I'm not sure it
is still worked, shall we figure out a stable way to address this issue?

Thanks
Jerry


[jira] [Updated] (LIVY-386) Refactor Livy core module with Java

2017-09-19 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-386:
-
Summary: Refactor Livy core module with Java  (was: Refactor Livy code 
module with Java)

> Refactor Livy core module with Java
> ---
>
> Key: LIVY-386
> URL: https://issues.apache.org/jira/browse/LIVY-386
> Project: Livy
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.4
>    Reporter: Saisai Shao
>
> Livy code module is used by several modules like server, repl, so it requires 
> to support different Scala version. It's really not so necessary for this 
> module to compile against different Scala version, so here I think it would 
> be better to refactor this module to use Java.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (SPARK-22030) GraphiteSink fails to re-connect to Graphite instances behind an ELB or any other auto-scaled LB

2017-09-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-22030:
---

Assignee: Alex Mikhailau

> GraphiteSink fails to re-connect to Graphite instances behind an ELB or any 
> other auto-scaled LB
> 
>
> Key: SPARK-22030
> URL: https://issues.apache.org/jira/browse/SPARK-22030
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Alex Mikhailau
>Assignee: Alex Mikhailau
>Priority: Critical
> Fix For: 2.3.0
>
>
> Upgrade codahale metrics library so that Graphite constructor can re-resolve 
> hosts behind a CNAME with re-tried DNS lookups. When Graphite is deployed 
> behind an ELB, ELB may change IP addresses based on auto-scaling needs. Using 
> current approach yields Graphite usage impossible, fixing for that use case
> Upgrade to codahale 3.1.5
> Use new Graphite(host, port) constructor instead of new Graphite(new 
> InetSocketAddress(host, port)) constructor
> This are proposed changes for codahale lib - 
> dropwizard/metrics@v3.1.2...v3.1.5#diff-6916c85d2dd08d89fe771c952e3b8512R120. 
> Specifically, 
> https://github.com/dropwizard/metrics/blob/b4d246d34e8a059b047567848b3522567cbe6108/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L120



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22030) GraphiteSink fails to re-connect to Graphite instances behind an ELB or any other auto-scaled LB

2017-09-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-22030:

Priority: Minor  (was: Critical)

> GraphiteSink fails to re-connect to Graphite instances behind an ELB or any 
> other auto-scaled LB
> 
>
> Key: SPARK-22030
> URL: https://issues.apache.org/jira/browse/SPARK-22030
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Alex Mikhailau
>Assignee: Alex Mikhailau
>Priority: Minor
> Fix For: 2.3.0
>
>
> Upgrade codahale metrics library so that Graphite constructor can re-resolve 
> hosts behind a CNAME with re-tried DNS lookups. When Graphite is deployed 
> behind an ELB, ELB may change IP addresses based on auto-scaling needs. Using 
> current approach yields Graphite usage impossible, fixing for that use case
> Upgrade to codahale 3.1.5
> Use new Graphite(host, port) constructor instead of new Graphite(new 
> InetSocketAddress(host, port)) constructor
> This are proposed changes for codahale lib - 
> dropwizard/metrics@v3.1.2...v3.1.5#diff-6916c85d2dd08d89fe771c952e3b8512R120. 
> Specifically, 
> https://github.com/dropwizard/metrics/blob/b4d246d34e8a059b047567848b3522567cbe6108/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L120



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22030) GraphiteSink fails to re-connect to Graphite instances behind an ELB or any other auto-scaled LB

2017-09-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-22030.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19210
[https://github.com/apache/spark/pull/19210]

> GraphiteSink fails to re-connect to Graphite instances behind an ELB or any 
> other auto-scaled LB
> 
>
> Key: SPARK-22030
> URL: https://issues.apache.org/jira/browse/SPARK-22030
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Alex Mikhailau
>Priority: Critical
> Fix For: 2.3.0
>
>
> Upgrade codahale metrics library so that Graphite constructor can re-resolve 
> hosts behind a CNAME with re-tried DNS lookups. When Graphite is deployed 
> behind an ELB, ELB may change IP addresses based on auto-scaling needs. Using 
> current approach yields Graphite usage impossible, fixing for that use case
> Upgrade to codahale 3.1.5
> Use new Graphite(host, port) constructor instead of new Graphite(new 
> InetSocketAddress(host, port)) constructor
> This are proposed changes for codahale lib - 
> dropwizard/metrics@v3.1.2...v3.1.5#diff-6916c85d2dd08d89fe771c952e3b8512R120. 
> Specifically, 
> https://github.com/dropwizard/metrics/blob/b4d246d34e8a059b047567848b3522567cbe6108/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java#L120



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (LIVY-396) Livy does not map YARN app states correctly

2017-09-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-396:


Assignee: Meisam Fathi

> Livy does not map YARN app states correctly
> ---
>
> Key: LIVY-396
> URL: https://issues.apache.org/jira/browse/LIVY-396
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.4
> Environment: all enironments
>Reporter: Meisam
>Assignee: Meisam Fathi
>Priority: Minor
> Fix For: 0.5.0
>
>   Original Estimate: 20m
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> If the status of a YARN app is {{KILLED}}, the final status should be 
> {{KILLED}} too, but there is a test case in 
> [{{SparkYarnAppSpec.scala}}|https://github.com/meisam/incubator-livy/blob/master/server/src/test/scala/org/apache/livy/utils/SparkYarnAppSpec.scala#L217]
>  that expect the final status to be {{UNDEFINED}}.
> {code}
>  assert(app.mapYarnState(appId, KILLED, UNDEFINED) == State.KILLED)
> {code}
>  The case that app state is {{KILLED}} and the final status is {{UNDEFINED}} 
> should never happen.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (LIVY-396) Livy does not map YARN app states correctly

2017-09-18 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-396.
--
Resolution: Fixed

Issue resolved by pull request 39
[https://github.com/apache/incubator-livy/pull/39]

> Livy does not map YARN app states correctly
> ---
>
> Key: LIVY-396
> URL: https://issues.apache.org/jira/browse/LIVY-396
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.4
> Environment: all enironments
>Reporter: Meisam
>Priority: Minor
> Fix For: 0.5.0
>
>   Original Estimate: 20m
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> If the status of a YARN app is {{KILLED}}, the final status should be 
> {{KILLED}} too, but there is a test case in 
> [{{SparkYarnAppSpec.scala}}|https://github.com/meisam/incubator-livy/blob/master/server/src/test/scala/org/apache/livy/utils/SparkYarnAppSpec.scala#L217]
>  that expect the final status to be {{UNDEFINED}}.
> {code}
>  assert(app.mapYarnState(appId, KILLED, UNDEFINED) == State.KILLED)
> {code}
>  The case that app state is {{KILLED}} and the final status is {{UNDEFINED}} 
> should never happen.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (LIVY-403) Add getCompletion for Interpreter

2017-09-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/LIVY-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-403:
-
Component/s: REPL

> Add getCompletion for Interpreter
> -
>
> Key: LIVY-403
> URL: https://issues.apache.org/jira/browse/LIVY-403
> Project: Livy
>  Issue Type: New Feature
>  Components: REPL
>Affects Versions: 0.5.0
>Reporter: Jeff Zhang
>
> This is for code completion feature of interpreter. Also need to add rest api.
> \cc [~jerryshao]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SPARK-21902) BlockManager.doPut will hide actually exception when exception thrown in finally block

2017-09-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21902:

Priority: Trivial  (was: Major)

> BlockManager.doPut will hide actually exception when exception thrown in 
> finally block
> --
>
> Key: SPARK-21902
> URL: https://issues.apache.org/jira/browse/SPARK-21902
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 2.1.0
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Trivial
> Fix For: 2.3.0
>
>
> As logging below, actually exception will be hidden when removeBlockInternal 
> throw an exception.
> {code:java}
> 2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting 
> block broadcast_110 failed due to an exception
> 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: 
> Failed to create a new broadcast in 1 attempts
> java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e.
> at 
> org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
> at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115)
> at 
> org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910)
> at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726)
> at 
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
> at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at 
> org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
> at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> I want to print the exception first for troubleshooting.Or may be we should 
> not throw exception when removing blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21902) BlockManager.doPut will hide actually exception when exception thrown in finally block

2017-09-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21902:

Issue Type: Improvement  (was: Wish)

> BlockManager.doPut will hide actually exception when exception thrown in 
> finally block
> --
>
> Key: SPARK-21902
> URL: https://issues.apache.org/jira/browse/SPARK-21902
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 2.1.0
>Reporter: zhoukang
>Assignee: zhoukang
> Fix For: 2.3.0
>
>
> As logging below, actually exception will be hidden when removeBlockInternal 
> throw an exception.
> {code:java}
> 2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting 
> block broadcast_110 failed due to an exception
> 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: 
> Failed to create a new broadcast in 1 attempts
> java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e.
> at 
> org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
> at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115)
> at 
> org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910)
> at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726)
> at 
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
> at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at 
> org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
> at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> I want to print the exception first for troubleshooting.Or may be we should 
> not throw exception when removing blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21902) BlockManager.doPut will hide actually exception when exception thrown in finally block

2017-09-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-21902:
---

Assignee: zhoukang

> BlockManager.doPut will hide actually exception when exception thrown in 
> finally block
> --
>
> Key: SPARK-21902
> URL: https://issues.apache.org/jira/browse/SPARK-21902
> Project: Spark
>  Issue Type: Wish
>  Components: Block Manager
>Affects Versions: 2.1.0
>Reporter: zhoukang
>Assignee: zhoukang
> Fix For: 2.3.0
>
>
> As logging below, actually exception will be hidden when removeBlockInternal 
> throw an exception.
> {code:java}
> 2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting 
> block broadcast_110 failed due to an exception
> 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: 
> Failed to create a new broadcast in 1 attempts
> java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e.
> at 
> org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
> at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115)
> at 
> org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910)
> at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726)
> at 
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
> at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at 
> org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
> at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> I want to print the exception first for troubleshooting.Or may be we should 
> not throw exception when removing blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21902) BlockManager.doPut will hide actually exception when exception thrown in finally block

2017-09-15 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-21902.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19171
[https://github.com/apache/spark/pull/19171]

> BlockManager.doPut will hide actually exception when exception thrown in 
> finally block
> --
>
> Key: SPARK-21902
> URL: https://issues.apache.org/jira/browse/SPARK-21902
> Project: Spark
>  Issue Type: Wish
>  Components: Block Manager
>Affects Versions: 2.1.0
>Reporter: zhoukang
> Fix For: 2.3.0
>
>
> As logging below, actually exception will be hidden when removeBlockInternal 
> throw an exception.
> {code:java}
> 2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting 
> block broadcast_110 failed due to an exception
> 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: 
> Failed to create a new broadcast in 1 attempts
> java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e.
> at 
> org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
> at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115)
> at 
> org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910)
> at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
> at 
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726)
> at 
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
> at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at 
> org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
> at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002)
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> I want to print the exception first for troubleshooting.Or may be we should 
> not throw exception when removing blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21922) When executor failed and task metrics have not send to driver,the status will always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'

2017-09-14 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-21922.
-
  Resolution: Fixed
Assignee: zhoukang
Target Version/s: 2.3.0

> When executor failed and task metrics have not send to driver,the status will 
> always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'
> -
>
> Key: SPARK-21922
> URL: https://issues.apache.org/jira/browse/SPARK-21922
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: zhoukang
>Assignee: zhoukang
> Attachments: fixed01.png, fixed02.png, notfixed01.png, notfixed02.png
>
>
> As title described,and below is an example:
> !notfixed01.png|Before fixed!
> !notfixed02.png|Before fixed!
> We can fix the duration time by the modify time of event log:
> !fixed01.png|After fixed!
> !fixed02.png|After fixed!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21922) When executor failed and task metrics have not send to driver,the status will always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'

2017-09-14 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21922:

Fix Version/s: 2.3.0

> When executor failed and task metrics have not send to driver,the status will 
> always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'
> -
>
> Key: SPARK-21922
> URL: https://issues.apache.org/jira/browse/SPARK-21922
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: zhoukang
>Assignee: zhoukang
> Fix For: 2.3.0
>
> Attachments: fixed01.png, fixed02.png, notfixed01.png, notfixed02.png
>
>
> As title described,and below is an example:
> !notfixed01.png|Before fixed!
> !notfixed02.png|Before fixed!
> We can fix the duration time by the modify time of event log:
> !fixed01.png|After fixed!
> !fixed02.png|After fixed!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21922) When executor failed and task metrics have not send to driver,the status will always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'

2017-09-14 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21922:

Target Version/s:   (was: 2.3.0)

> When executor failed and task metrics have not send to driver,the status will 
> always be 'RUNNING' and the duration will be 'CurrentTime - launchTime'
> -
>
> Key: SPARK-21922
> URL: https://issues.apache.org/jira/browse/SPARK-21922
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: zhoukang
>Assignee: zhoukang
> Fix For: 2.3.0
>
> Attachments: fixed01.png, fixed02.png, notfixed01.png, notfixed02.png
>
>
> As title described,and below is an example:
> !notfixed01.png|Before fixed!
> !notfixed02.png|Before fixed!
> We can fix the duration time by the modify time of event log:
> !fixed01.png|After fixed!
> !fixed02.png|After fixed!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21513) SQL to_json should support all column types

2017-09-12 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164000#comment-16164000
 ] 

Saisai Shao commented on SPARK-21513:
-

[~goldmedal] is it your correct JIRA name?

[~hyukjin.kwon] I added you to admin group, I think you can handle it yourself 
now.

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>Assignee: Jia-Xuan Liu
>  Labels: Starter
> Fix For: 2.3.0
>
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map<string,string> must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21513) SQL to_json should support all column types

2017-09-12 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-21513:
---

Assignee: Jia-Xuan Liu

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>Assignee: Jia-Xuan Liu
>  Labels: Starter
> Fix For: 2.3.0
>
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map<string,string> must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Saisai Shao
I see. So based on this, we should manage a data structure in Livy Server
to keep all the live sessions' name. Also regarding to session recovery, we
should persist this structure to the reliable storage and recover after
restart.

I'm not pretty sure if it is a good feature or not. First because we
usually programmatically manage the session id, so from code level to
manager a session id or a session name there's no much difference; second,
usually it is hard for user to pick a unique name if one Livy Server has
many live sessions, by large chance the name will be conflicted, people
always like short, simple name.

Since I'm so familiar how people really use it, so it is just my two cents.

Thanks
Jerry


On Tue, Sep 12, 2017 at 8:46 AM, Meisam Fathi 
wrote:

> > If we're using session name, how do we guarantee the uniqueness of this
> > name?
> >
>
> If the requested session name already exist, Livy returns an error and does
> not create the session.
>
> Thanks,
> Meisam
>


Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Saisai Shao
If we're using session name, how do we guarantee the uniqueness of this
name?

Thanks
Jerry

On Tue, Sep 12, 2017 at 4:51 AM, Alex Bozarth  wrote:

> I would agree with Marcelo's comment the JIRA that this isn't a good
> feature for livy, but I'll take a look at your impl if you open a PR and
> see if it changes my mind.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* 
> *GitHub: **github.com/ajbozarth* 
>
>
> 505 Howard Street
> 
> San Francisco, CA 94105
> 
> United States
> 
>
>
>
> [image: Inactive hide details for Meisam Fathi ---09/11/2017 10:23:49
> AM---+ dev Is there any interest in adding this feature to Livy?]Meisam
> Fathi ---09/11/2017 10:23:49 AM---+ dev Is there any interest in adding
> this feature to Livy? I can send a PR
>
> From: Meisam Fathi 
> To: "u...@livy.incubator.apache.org" , "
> dev@livy.incubator.apache.org" 
> Date: 09/11/2017 10:23 AM
> Subject: Re: user defined sessionId / URI for Livy sessions
> --
>
>
>
> + dev
> Is there any interest in adding this feature to Livy? I can send a PR
>
> Ideally, it would be helpful if we could mint a session ID with a PUT
> > request, something like PUT /sessions/foobar, where "foobar" is the newly
> > created sessionId.
> >
> > I suggest we make session names unique and nonnumeric values (to
> guarantee
> a session name does not clash with another session name or session ID).
>
> Design doc:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_meisam_incubator-2Dlivy_wiki_Design-2Ddoc-2Dfor-
> 2DLivy-2D41-3A-2DAccessing-2Dsessions-2Dby-2Dname=
> DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-
> cx37DPYDyo=bUJg_csAaA5f2DPiMkjU-juQkf5Q2FMYtA5kv5sqiMM=
> xTiY52FMWMdTRgCmiNRWe6yEoCchxKNxQrYPEkPupbw=
> JIRA ticket: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_LIVY-2D41=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=bUJg_csAaA5f2DPiMkjU-
> juQkf5Q2FMYtA5kv5sqiMM=lFed2hYlDA_wUo94RWUAw7N01lSN368P-ABmP_npWrM=
>
>
> Thanks,
> Meisam
>
>
>
>


Re: user defined sessionId / URI for Livy sessions

2017-09-11 Thread Saisai Shao
If we're using session name, how do we guarantee the uniqueness of this
name?

Thanks
Jerry

On Tue, Sep 12, 2017 at 4:51 AM, Alex Bozarth  wrote:

> I would agree with Marcelo's comment the JIRA that this isn't a good
> feature for livy, but I'll take a look at your impl if you open a PR and
> see if it changes my mind.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* 
> *GitHub: **github.com/ajbozarth* 
>
>
> 505 Howard Street
> 
> San Francisco, CA 94105
> 
> United States
> 
>
>
>
> [image: Inactive hide details for Meisam Fathi ---09/11/2017 10:23:49
> AM---+ dev Is there any interest in adding this feature to Livy?]Meisam
> Fathi ---09/11/2017 10:23:49 AM---+ dev Is there any interest in adding
> this feature to Livy? I can send a PR
>
> From: Meisam Fathi 
> To: "user@livy.incubator.apache.org" , "
> d...@livy.incubator.apache.org" 
> Date: 09/11/2017 10:23 AM
> Subject: Re: user defined sessionId / URI for Livy sessions
> --
>
>
>
> + dev
> Is there any interest in adding this feature to Livy? I can send a PR
>
> Ideally, it would be helpful if we could mint a session ID with a PUT
> > request, something like PUT /sessions/foobar, where "foobar" is the newly
> > created sessionId.
> >
> > I suggest we make session names unique and nonnumeric values (to
> guarantee
> a session name does not clash with another session name or session ID).
>
> Design doc:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_meisam_incubator-2Dlivy_wiki_Design-2Ddoc-2Dfor-
> 2DLivy-2D41-3A-2DAccessing-2Dsessions-2Dby-2Dname=
> DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-
> cx37DPYDyo=bUJg_csAaA5f2DPiMkjU-juQkf5Q2FMYtA5kv5sqiMM=
> xTiY52FMWMdTRgCmiNRWe6yEoCchxKNxQrYPEkPupbw=
> JIRA ticket: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_LIVY-2D41=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=bUJg_csAaA5f2DPiMkjU-
> juQkf5Q2FMYtA5kv5sqiMM=lFed2hYlDA_wUo94RWUAw7N01lSN368P-ABmP_npWrM=
>
>
> Thanks,
> Meisam
>
>
>
>


[jira] [Commented] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descrip

2017-09-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161008#comment-16161008
 ] 

Saisai Shao commented on SPARK-21943:
-

If you're trying to report bugs, I think you should provide a simplest way to 
reproduce this issue, otherwise it is hard for others to know your issue. If 
you're trying to fix this issue, then you need to get familiar with the code 
and try to debug on the codes. It is hard for others (even the author) to know 
your issue with limited information.

> When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view 
> some of the jobs that are running jobs, the returned json information is 
> missing the “description” field.
> ---
>
> Key: SPARK-21943
> URL: https://issues.apache.org/jira/browse/SPARK-21943
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: xianquan
>Priority: Minor
>  Labels: rest_api
> Attachments: webUI.png
>
>
> When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view 
> some of the jobs that are running jobs,some of  the returned json information 
> is missing the “description“” field.
> The returned json results are as follows:
>  [{
>   "jobId" : 7,
>   "name" : "run at AccessController.java:0",
>   "submissionTime" : "2017-09-07T09:44:53.632GMT",
>   "stageIds" : [ 19, 17, 18 ],
>   "jobGroup" : "cee1fb91-56bd-4d53-aed4-d409d21809da",
>  * "status" : "RUNNING*",
>   "numTasks" : 202,
>   "numActiveTasks" : 1,
>   "numCompletedTasks" : 0,
>   "numSkippedTasks" : 0,
>   "numFailedTasks" : 0,
>   "numActiveStages" : 2,
>   "numCompletedStages" : 0,
>   "numSkippedStages" : 0,
>   "numFailedStages" : 0
> },
> {
>   "jobId" : 6,
>   "name" : "run at AccessController.java:0",
>   "description" : "select * from test",
>   "submissionTime" : "2017-09-07T09:54:09.532GMT",
>   "stageIds" : [ 24 ],
>   "jobGroup" : "de8071d7-cb09-47af-a343-3d84946c2aff",
>   "status" : "RUNNING",
>   "numTasks" : 1,
>   "numActiveTasks" : 0,
>   "numCompletedTasks" : 0,
>   "numSkippedTasks" : 0,
>   "numFailedTasks" : 0,
>   "numActiveStages" : 1,
>   "numCompletedStages" : 0,
>   "numSkippedStages" : 0,
>   "numFailedStages" : 0
> }]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21943) When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view some of the jobs that are running jobs, the returned json information is missing the “descrip

2017-09-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160814#comment-16160814
 ] 

Saisai Shao commented on SPARK-21943:
-

>From the code, it says that job description is gotten from the description of 
>last stage in this job. So looks like there's no last stage description for 
>job id "7", I think you need to track the code to know why, but I guess this 
>might be an intended behavior. 

> When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view 
> some of the jobs that are running jobs, the returned json information is 
> missing the “description” field.
> ---
>
> Key: SPARK-21943
> URL: https://issues.apache.org/jira/browse/SPARK-21943
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.1.1
>Reporter: xianquan
>Priority: Minor
>  Labels: rest_api
>
> When I use rest Api (/ applications / [app-id] / jobs / [job-id]) to view 
> some of the jobs that are running jobs,some of  the returned json information 
> is missing the “description“” field.
> The returned json results are as follows:
>  [{
>   "jobId" : 7,
>   "name" : "run at AccessController.java:0",
>   "submissionTime" : "2017-09-07T09:44:53.632GMT",
>   "stageIds" : [ 19, 17, 18 ],
>   "jobGroup" : "cee1fb91-56bd-4d53-aed4-d409d21809da",
>  * "status" : "RUNNING*",
>   "numTasks" : 202,
>   "numActiveTasks" : 1,
>   "numCompletedTasks" : 0,
>   "numSkippedTasks" : 0,
>   "numFailedTasks" : 0,
>   "numActiveStages" : 2,
>   "numCompletedStages" : 0,
>   "numSkippedStages" : 0,
>   "numFailedStages" : 0
> },
> {
>   "jobId" : 6,
>   "name" : "run at AccessController.java:0",
>   "description" : "select * from test",
>   "submissionTime" : "2017-09-07T09:54:09.532GMT",
>   "stageIds" : [ 24 ],
>   "jobGroup" : "de8071d7-cb09-47af-a343-3d84946c2aff",
>   "status" : "RUNNING",
>   "numTasks" : 1,
>   "numActiveTasks" : 0,
>   "numCompletedTasks" : 0,
>   "numSkippedTasks" : 0,
>   "numFailedTasks" : 0,
>   "numActiveStages" : 1,
>   "numCompletedStages" : 0,
>   "numSkippedStages" : 0,
>   "numFailedStages" : 0
> }]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-21906) No need to runAsSparkUser to switch UserGroupInformation in YARN mode

2017-09-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao closed SPARK-21906.
---
Resolution: Not A Problem

> No need to runAsSparkUser to switch UserGroupInformation in YARN mode
> -
>
> Key: SPARK-21906
> URL: https://issues.apache.org/jira/browse/SPARK-21906
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 2.2.0
>Reporter: Kent Yao
>
> 1、The Yarn application‘s ugi is determined by the ugi launching it
> 2、 runAsSparkUser is used to switch a ugi as same as itself, because we have 
> already set {code:java} env("SPARK_USER") = 
> UserGroupInformation.getCurrentUser().getShortUserName() {code} in the am 
> container context
> {code:java}
>  def runAsSparkUser(func: () => Unit) {
> val user = Utils.getCurrentUserName()  // get the user itself
> logDebug("running as user: " + user)
> val ugi = UserGroupInformation.createRemoteUser(user) // create a new ugi 
> use itself
> transferCredentials(UserGroupInformation.getCurrentUser(), ugi) // 
> transfer its own credentials 
> ugi.doAs(new PrivilegedExceptionAction[Unit] { // doAs as itseft
>   def run: Unit = func()
> })
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Multiple vcores per container when running Spark applications in Yarn cluster mode

2017-09-10 Thread Saisai Shao
I guess you're using Capacity Scheduler with DefaultResourceCalculator,
which doesn't count cpu cores into resource calculation, this "1" you saw
is actually meaningless. If you want to also calculate cpu resource, you
should choose DominantResourceCalculator.

Thanks
Jerry

On Sat, Sep 9, 2017 at 6:54 AM, Xiaoye Sun  wrote:

> Hi,
>
> I am using Spark 1.6.1 and Yarn 2.7.4.
> I want to submit a Spark application to a Yarn cluster. However, I found
> that the number of vcores assigned to a container/executor is always 1,
> even if I set spark.executor.cores=2. I also found the number of tasks an
> executor runs concurrently is 2. So, it seems that Spark knows that an
> executor/container has two CPU cores but the request is not correctly sent
> to Yarn resource scheduler. I am using the org.apache.hadoop.yarn.
> server.resourcemanager.scheduler.capacity.CapacityScheduler on Yarn.
>
> I am wondering that is it possible to assign multiple vcores to a
> container when a Spark job is submitted to a Yarn cluster in yarn-cluster
> mode.
>
> Thanks!
> Best,
> Xiaoye
>


[jira] [Assigned] (SPARK-20098) DataType's typeName method returns with 'StructF' in case of StructField

2017-09-10 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-20098:
---

Assignee: Peter Szalai

> DataType's typeName method returns with 'StructF' in case of StructField
> 
>
> Key: SPARK-20098
> URL: https://issues.apache.org/jira/browse/SPARK-20098
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Peter Szalai
>Assignee: Peter Szalai
> Fix For: 2.2.1, 2.3.0
>
>
> Currently, if you want to get the name of a DateType and the DateType is a 
> `StructField`, you get `StructF`. 
> http://spark.apache.org/docs/2.1.0/api/python/_modules/pyspark/sql/types.html 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21942) DiskBlockManager crashing when a root local folder has been externally deleted by OS

2017-09-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158320#comment-16158320
 ] 

Saisai Shao commented on SPARK-21942:
-

Personally I would like to fail fast if such things happened, here it happened 
to clean the root folder and using {{mkdirs}} can handle this issue, but if 
some persistent block or shuffle index file is removed (because it is closed), 
I think there's no way to handle it. So instead of trying to workaround it, 
exposing an exception to user might be more useful, and will let user to know 
the issue earlier.

> DiskBlockManager crashing when a root local folder has been externally 
> deleted by OS
> 
>
> Key: SPARK-21942
> URL: https://issues.apache.org/jira/browse/SPARK-21942
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 
> 2.2.0, 2.2.1, 2.3.0, 3.0.0
>Reporter: Ruslan Shestopalyuk
>Priority: Minor
>  Labels: storage
> Fix For: 2.3.0
>
>
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be 
> configured via _spark.local.dir_ option, and which defaults to the system's 
> _/tmp_. The hierarchy is two-level, e.g. _/blockmgr-XXX.../YY_, where the 
> _YY_ part is a hash bit, to spread files evenly.
> Function _DiskBlockManager.getFile_ expects the top level directories 
> (_blockmgr-XXX..._) to always exist (they get created once, when the spark 
> context is first created), otherwise it would fail with a message like:
> {code}
> ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default _/tmp_ folder*, there can be different 
> strategies of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is 
> running for a while (a few days), it may not be able to load files anymore, 
> since the top-level scratch directories are not there and 
> _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files 
> manually.
> We have both the facts that _/tmp_ is the default in the spark config and 
> that the system has the right to tamper with its contents, and will do it 
> with a high probability, after some period of time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21942) DiskBlockManager crashing when a root local folder has been externally deleted by OS

2017-09-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158271#comment-16158271
 ] 

Saisai Shao commented on SPARK-21942:
-

{quote}
https://github.com/search?utf8=%E2%9C%93=filename%3Aspark-defaults.conf++NOT+spark.local.dir=Code

shows 2000+ repos that omit the `spark.local.dir` setting altogether, which 
means they are using `/tmp`, even though it's not a good default choice.
Which of course does not prove anything, since those are not necessarily 
"production environments".
{quote}

[~rshest] you can always find out reasons, but I don't think this is a valid 
issue.

> DiskBlockManager crashing when a root local folder has been externally 
> deleted by OS
> 
>
> Key: SPARK-21942
> URL: https://issues.apache.org/jira/browse/SPARK-21942
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 
> 2.2.0, 2.2.1, 2.3.0, 3.0.0
>Reporter: Ruslan Shestopalyuk
>Priority: Minor
>  Labels: storage
> Fix For: 2.3.0
>
>
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be 
> configured via _spark.local.dir_ option, and which defaults to the system's 
> _/tmp_. The hierarchy is two-level, e.g. _/blockmgr-XXX.../YY_, where the 
> _YY_ part is a hash bit, to spread files evenly.
> Function _DiskBlockManager.getFile_ expects the top level directories 
> (_blockmgr-XXX..._) to always exist (they get created once, when the spark 
> context is first created), otherwise it would fail with a message like:
> {code}
> ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default _/tmp_ folder*, there can be different 
> strategies of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is 
> running for a while (a few days), it may not be able to load files anymore, 
> since the top-level scratch directories are not there and 
> _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files 
> manually.
> We have both the facts that _/tmp_ is the default in the spark config and 
> that the system has the right to tamper with its contents, and will do it 
> with a high probability, after some period of time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21939) Use TimeLimits instead of Timeouts

2017-09-07 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-21939:
---

Assignee: Dongjoon Hyun

> Use TimeLimits instead of Timeouts
> --
>
> Key: SPARK-21939
> URL: https://issues.apache.org/jira/browse/SPARK-21939
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 2.3.0
>
>
> Since ScalaTest 3.0.0, `org.scalatest.concurrent.TimeLimits` is deprecated.
> This issue replaces the deprecated one with 
> `org.scalatest.concurrent.TimeLimits`.
> {code}
> -import org.scalatest.concurrent.Timeouts._
> +import org.scalatest.concurrent.TimeLimits._
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21939) Use TimeLimits instead of Timeouts

2017-09-07 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-21939.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19150
[https://github.com/apache/spark/pull/19150]

> Use TimeLimits instead of Timeouts
> --
>
> Key: SPARK-21939
> URL: https://issues.apache.org/jira/browse/SPARK-21939
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
> Fix For: 2.3.0
>
>
> Since ScalaTest 3.0.0, `org.scalatest.concurrent.TimeLimits` is deprecated.
> This issue replaces the deprecated one with 
> `org.scalatest.concurrent.TimeLimits`.
> {code}
> -import org.scalatest.concurrent.Timeouts._
> +import org.scalatest.concurrent.TimeLimits._
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21942) ix DiskBlockManager crashing when a root local folder has been externally deleted by OS

2017-09-07 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156914#comment-16156914
 ] 

Saisai Shao commented on SPARK-21942:
-

I think in a production environment, user should always configure local dirs, 
using /tmp is not a good choice:

* If node has multiple disks, usually user needs to configure local dirs to all 
the disks (like MR) to improve the performance.
* If deliberately using RAM disk, then usually user will mount a different RAM 
FS other than /tmp.

I don't think in a production environment, user will choose /tmp as local dirs. 
The same situation is local dir for Hadoop, by default it will choose 
/tmp/hadoop-. But I think no production environment will use this default 
configuration.



> ix DiskBlockManager crashing when a root local folder has been externally 
> deleted by OS
> ---
>
> Key: SPARK-21942
> URL: https://issues.apache.org/jira/browse/SPARK-21942
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 
> 2.2.0, 2.2.1, 2.3.0, 3.0.0
>Reporter: Ruslan Shestopalyuk
>Priority: Minor
>  Labels: storage
> Fix For: 2.3.0
>
>
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be 
> configured via_ spark.local.dir_ option, and which defaults to the system's 
> _/tmp_. The hierarchy is two-level, e.g._ /blockmgr-XXX.../YY_, where the 
> _YY_ part is a hash bit, to spread files evenly.
> Function_ DiskBlockManager.getFile_ expects the top level directories 
> (_blockmgr-XXX..._) to always exist (they get created once, when the spark 
> context is first created), otherwise it would fail with a message like:
> {code}
> ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default_ /tmp _folder*, there can be different 
> strategies of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is 
> running for a while (a few days), it may not be able to load files anymore, 
> since the top-level scratch directories are not there and 
> _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files 
> manually.
> We have both the facts that _/tmp_ is the default in the spark config and 
> that the system has the right to tamper with its contents, and will do it 
> with a high probability, after some period of time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9104) expose network layer memory usage

2017-09-06 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-9104:
---
Summary: expose network layer memory usage  (was: expose network layer 
memory usage in shuffle part)

> expose network layer memory usage
> -
>
> Key: SPARK-9104
> URL: https://issues.apache.org/jira/browse/SPARK-9104
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Zhang, Liye
>    Assignee: Saisai Shao
> Fix For: 2.3.0
>
>
> The default network transportation is netty, and when transfering blocks for 
> shuffle, the network layer will consume a decent size of memory, we shall 
> collect the memory usage of this part and expose it. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21934) Expose Netty memory usage via Metrics System

2017-09-06 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21934:

Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-9103

> Expose Netty memory usage via Metrics System
> 
>
> Key: SPARK-21934
> URL: https://issues.apache.org/jira/browse/SPARK-21934
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>
> This is a follow-up work of SPARK-9104 to expose the Netty memory usage to 
> MetricsSystem. My initial thought is to only expose Shuffle memory usage, 
> since shuffle is a major part of memory usage in network communication 
> compared to RPC, file server, block transfer. 
> If user wants to also expose Netty memory usage for other modules, we could 
> add more metrics later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21934) Expose Netty memory usage via Metrics System

2017-09-06 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-21934:
---

 Summary: Expose Netty memory usage via Metrics System
 Key: SPARK-21934
 URL: https://issues.apache.org/jira/browse/SPARK-21934
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Saisai Shao


This is a follow-up work of SPARK-9104 to expose the Netty memory usage to 
MetricsSystem. My initial thought is to only expose Shuffle memory usage, since 
shuffle is a major part of memory usage in network communication compared to 
RPC, file server, block transfer. 

If user wants to also expose Netty memory usage for other modules, we could add 
more metrics later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21917) Remote http(s) resources is not supported in YARN mode

2017-09-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154744#comment-16154744
 ] 

Saisai Shao commented on SPARK-21917:
-

Thanks [~vanzin], I think your suggestion is great, this will also handle 
Hadoop 2.9+ compatible issue, let me improve the current code.

> Remote http(s) resources is not supported in YARN mode
> --
>
> Key: SPARK-21917
> URL: https://issues.apache.org/jira/browse/SPARK-21917
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, YARN
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> In the current Spark, when submitting application on YARN with remote 
> resources {{./bin/spark-shell --jars 
> http://central.maven.org/maven2/com/github/swagger-akka-http/swagger-akka-http_2.11/0.10.1/swagger-akka-http_2.11-0.10.1.jar
>  --master yarn-client -v}}, Spark will be failed with:
> {noformat}
> java.io.IOException: No FileSystem for scheme: http
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>   at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354)
>   at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599)
>   at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598)
>   at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
> {noformat}
> This is because {{YARN#client}} assumes resources must be on the Hadoop 
> compatible FS, also in the NM 
> (https://github.com/apache/hadoop/blob/99e558b13ba4d5832aea97374e1d07b4e78e5e39/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java#L245)
>  it will only use Hadoop compatible FS to download resources. So this makes 
> Spark on YARN fail to support remote http(s) resources.
> To solve this problem, there might be several options:
> * Download remote http(s) resources to local and add this local downloaded 
> resources to dist cache. The downside of this option is that remote resources 
> will be uploaded again unnecessarily.
> * Filter remote http(s) resources and add them with spark.jars or 
> spark.files, to leverage Spark's internal fileserver to distribute remote 
> http(s) resources. The problem of this solution is: for some resources which 
> require to be available before application start may not work.
> * Leverage Hadoop's support http(s) file system 
> (https://issues.apache.org/jira/browse/HADOOP-14383). This is only worked in 
> Hadoop 2.9+, and I think even we implement a similar one in Spark will not be 
> worked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18061) Spark Thriftserver needs to create SPNego principal

2017-09-05 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-18061:
---

Assignee: Saisai Shao

> Spark Thriftserver needs to create SPNego principal
> ---
>
> Key: SPARK-18061
> URL: https://issues.apache.org/jira/browse/SPARK-18061
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.1
>Reporter: Chandana Mirashi
>Assignee: Saisai Shao
> Fix For: 2.3.0
>
>
> Spark Thriftserver when running in HTTP mode with Kerberos enabled gives a 
> 401 authentication error when receiving beeline HTTP request (with end user 
> as kerberos principal). The similar command works with Hive Thriftserver.
> What we find is Hive thriftserver CLI service creates both hive service and 
> SPNego principal when kerberos is enabled whereas Spark Thriftserver
> only creates hive service principal.
> {code:title=CLIService.java|borderStyle=solid}
> if (UserGroupInformation.isSecurityEnabled()) {
>   try {
> HiveAuthFactory.loginFromKeytab(hiveConf);
> this.serviceUGI = Utils.getUGI();
>   } catch (IOException e) {
> throw new ServiceException("Unable to login to kerberos with given 
> principal/keytab", e);
>   } catch (LoginException e) {
> throw new ServiceException("Unable to login to kerberos with given 
> principal/keytab", e);
>   }
>   // Also try creating a UGI object for the SPNego principal
>   String principal = 
> hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL);
>   String keyTabFile = 
> hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB);
>   if (principal.isEmpty() || keyTabFile.isEmpty()) {
> LOG.info("SPNego httpUGI not created, spNegoPrincipal: " + principal +
> ", ketabFile: " + keyTabFile);
>   } else {
> try {
>   this.httpUGI = 
> HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf);
>   LOG.info("SPNego httpUGI successfully created.");
> } catch (IOException e) {
>   LOG.warn("SPNego httpUGI creation failed: ", e);
> }
>   }
> }
> {code}
> {code:title=SparkSQLCLIService.scala|borderStyle=solid}
> if (UserGroupInformation.isSecurityEnabled) {
>   try {
> HiveAuthFactory.loginFromKeytab(hiveConf)
> sparkServiceUGI = Utils.getUGI()
> setSuperField(this, "serviceUGI", sparkServiceUGI)
>   } catch {
> case e @ (_: IOException | _: LoginException) =>
>   throw new ServiceException("Unable to login to kerberos with given 
> principal/keytab", e)
>   }
> }
> {code}
> The patch will add missing SPNego principal to Spark Thriftserver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18061) Spark Thriftserver needs to create SPNego principal

2017-09-05 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-18061.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 18628
[https://github.com/apache/spark/pull/18628]

> Spark Thriftserver needs to create SPNego principal
> ---
>
> Key: SPARK-18061
> URL: https://issues.apache.org/jira/browse/SPARK-18061
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.1
>Reporter: Chandana Mirashi
> Fix For: 2.3.0
>
>
> Spark Thriftserver when running in HTTP mode with Kerberos enabled gives a 
> 401 authentication error when receiving beeline HTTP request (with end user 
> as kerberos principal). The similar command works with Hive Thriftserver.
> What we find is Hive thriftserver CLI service creates both hive service and 
> SPNego principal when kerberos is enabled whereas Spark Thriftserver
> only creates hive service principal.
> {code:title=CLIService.java|borderStyle=solid}
> if (UserGroupInformation.isSecurityEnabled()) {
>   try {
> HiveAuthFactory.loginFromKeytab(hiveConf);
> this.serviceUGI = Utils.getUGI();
>   } catch (IOException e) {
> throw new ServiceException("Unable to login to kerberos with given 
> principal/keytab", e);
>   } catch (LoginException e) {
> throw new ServiceException("Unable to login to kerberos with given 
> principal/keytab", e);
>   }
>   // Also try creating a UGI object for the SPNego principal
>   String principal = 
> hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL);
>   String keyTabFile = 
> hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB);
>   if (principal.isEmpty() || keyTabFile.isEmpty()) {
> LOG.info("SPNego httpUGI not created, spNegoPrincipal: " + principal +
> ", ketabFile: " + keyTabFile);
>   } else {
> try {
>   this.httpUGI = 
> HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf);
>   LOG.info("SPNego httpUGI successfully created.");
> } catch (IOException e) {
>   LOG.warn("SPNego httpUGI creation failed: ", e);
> }
>   }
> }
> {code}
> {code:title=SparkSQLCLIService.scala|borderStyle=solid}
> if (UserGroupInformation.isSecurityEnabled) {
>   try {
> HiveAuthFactory.loginFromKeytab(hiveConf)
> sparkServiceUGI = Utils.getUGI()
> setSuperField(this, "serviceUGI", sparkServiceUGI)
>   } catch {
> case e @ (_: IOException | _: LoginException) =>
>   throw new ServiceException("Unable to login to kerberos with given 
> principal/keytab", e)
>   }
> }
> {code}
> The patch will add missing SPNego principal to Spark Thriftserver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21917) Remote http(s) resources is not supported in YARN mode

2017-09-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153141#comment-16153141
 ] 

Saisai Shao edited comment on SPARK-21917 at 9/5/17 8:36 AM:
-

I'm inclining to choose option 1, the only overhead is resource re-uploading, 
the fix is restricted to SparkSubmit and other codes could be worked 
transparently.

What's your opinion [~tgraves] [~vanzin]?


was (Author: jerryshao):
I'm inclining to choose option 1, the only overhead is resource re-uploading, 
the fix is restricted to SparkSubmit and all other code could be worked 
transparently.

What's your opinion [~tgraves] [~vanzin]?

> Remote http(s) resources is not supported in YARN mode
> --
>
> Key: SPARK-21917
> URL: https://issues.apache.org/jira/browse/SPARK-21917
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, YARN
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> In the current Spark, when submitting application on YARN with remote 
> resources {{./bin/spark-shell --jars 
> http://central.maven.org/maven2/com/github/swagger-akka-http/swagger-akka-http_2.11/0.10.1/swagger-akka-http_2.11-0.10.1.jar
>  --master yarn-client -v}}, Spark will be failed with:
> {noformat}
> java.io.IOException: No FileSystem for scheme: http
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>   at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354)
>   at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599)
>   at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598)
>   at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
> {noformat}
> This is because {{YARN#client}} assumes resources must be on the Hadoop 
> compatible FS, also in the NM 
> (https://github.com/apache/hadoop/blob/99e558b13ba4d5832aea97374e1d07b4e78e5e39/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java#L245)
>  it will only use Hadoop compatible FS to download resources. So this makes 
> Spark on YARN fail to support remote http(s) resources.
> To solve this problem, there might be several options:
> * Download remote http(s) resources to local and add this local downloaded 
> resources to dist cache. The downside of this option is that remote resources 
> will be uploaded again unnecessarily.
> * Filter remote http(s) resources and add them with spark.jars or 
> spark.files, to leverage Spark's internal fileserver to distribute remote 
> http(s) resources. The problem of this solution is: for some resources which 
> require to be available before application start may not work.
> * Leverage Hadoop's support http(s) file system 
> (https://issues.apache.org/jira/browse/HADOOP-14383). This is only worked in 
> Hadoop 2.9+, and I think even we implement a similar one in Spark will not be 
> worked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21917) Remote http(s) resources is not supported in YARN mode

2017-09-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153141#comment-16153141
 ] 

Saisai Shao commented on SPARK-21917:
-

I'm inclining to choose option 1, the only overhead is resource re-uploading, 
the fix is restricted to SparkSubmit and all other code could be worked 
transparently.

What's your opinion [~tgraves] [~vanzin]?

> Remote http(s) resources is not supported in YARN mode
> --
>
> Key: SPARK-21917
> URL: https://issues.apache.org/jira/browse/SPARK-21917
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, YARN
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> In the current Spark, when submitting application on YARN with remote 
> resources {{./bin/spark-shell --jars 
> http://central.maven.org/maven2/com/github/swagger-akka-http/swagger-akka-http_2.11/0.10.1/swagger-akka-http_2.11-0.10.1.jar
>  --master yarn-client -v}}, Spark will be failed with:
> {noformat}
> java.io.IOException: No FileSystem for scheme: http
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>   at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354)
>   at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599)
>   at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598)
>   at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
> {noformat}
> This is because {{YARN#client}} assumes resources must be on the Hadoop 
> compatible FS, also in the NM 
> (https://github.com/apache/hadoop/blob/99e558b13ba4d5832aea97374e1d07b4e78e5e39/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java#L245)
>  it will only use Hadoop compatible FS to download resources. So this makes 
> Spark on YARN fail to support remote http(s) resources.
> To solve this problem, there might be several options:
> * Download remote http(s) resources to local and add this local downloaded 
> resources to dist cache. The downside of this option is that remote resources 
> will be uploaded again unnecessarily.
> * Filter remote http(s) resources and add them with spark.jars or 
> spark.files, to leverage Spark's internal fileserver to distribute remote 
> http(s) resources. The problem of this solution is: for some resources which 
> require to be available before application start may not work.
> * Leverage Hadoop's support http(s) file system 
> (https://issues.apache.org/jira/browse/HADOOP-14383). This is only worked in 
> Hadoop 2.9+, and I think even we implement a similar one in Spark will not be 
> worked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21917) Remote http(s) resources is not supported in YARN mode

2017-09-05 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-21917:
---

 Summary: Remote http(s) resources is not supported in YARN mode
 Key: SPARK-21917
 URL: https://issues.apache.org/jira/browse/SPARK-21917
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit, YARN
Affects Versions: 2.2.0
Reporter: Saisai Shao
Priority: Minor


In the current Spark, when submitting application on YARN with remote resources 
{{./bin/spark-shell --jars 
http://central.maven.org/maven2/com/github/swagger-akka-http/swagger-akka-http_2.11/0.10.1/swagger-akka-http_2.11-0.10.1.jar
 --master yarn-client -v}}, Spark will be failed with:

{noformat}
java.io.IOException: No FileSystem for scheme: http
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at 
org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354)
at 
org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
at 
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600)
at 
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at 
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599)
at 
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
{noformat}

This is because {{YARN#client}} assumes resources must be on the Hadoop 
compatible FS, also in the NM 
(https://github.com/apache/hadoop/blob/99e558b13ba4d5832aea97374e1d07b4e78e5e39/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java#L245)
 it will only use Hadoop compatible FS to download resources. So this makes 
Spark on YARN fail to support remote http(s) resources.

To solve this problem, there might be several options:

* Download remote http(s) resources to local and add this local downloaded 
resources to dist cache. The downside of this option is that remote resources 
will be uploaded again unnecessarily.

* Filter remote http(s) resources and add them with spark.jars or spark.files, 
to leverage Spark's internal fileserver to distribute remote http(s) resources. 
The problem of this solution is: for some resources which require to be 
available before application start may not work.

* Leverage Hadoop's support http(s) file system 
(https://issues.apache.org/jira/browse/HADOOP-14383). This is only worked in 
Hadoop 2.9+, and I think even we implement a similar one in Spark will not be 
worked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[ANNOUNCE] Apache Livy 0.4.0-incubating released

2017-09-04 Thread Saisai Shao
The Apache Livy team is proud to announce Apache Livy version
0.4.0-incubating.

This is the first Livy release after entering the Apache Incubator.

Livy is web service that exposes a REST interface for managing
long running Apache Spark contexts in your cluster. With Livy, new
applications can be built on top of Apache Spark that require fine grained
interaction with many Spark contexts.

For Livy release details and downloads, visit:
http://livy.incubator.apache.org/download/


We would like to thank the contributors that made the release possible.


Regards,
The Livy Team


[ANNOUNCE] Apache Livy 0.4.0-incubating released

2017-09-04 Thread Saisai Shao
The Apache Livy team is proud to announce Apache Livy version
0.4.0-incubating.

This is the first Livy release after entering the Apache Incubator.

Livy is web service that exposes a REST interface for managing
long running Apache Spark contexts in your cluster. With Livy, new
applications can be built on top of Apache Spark that require fine grained
interaction with many Spark contexts.

For Livy release details and downloads, visit:
http://livy.incubator.apache.org/download/


We would like to thank the contributors that made the release possible.


Regards,
The Livy Team


Re: The process of Livy 0.4.0-incubating release

2017-09-03 Thread Saisai Shao
Hi all,

All the stuffs related to 0.4.0-incubating release are done, please check.

If you don't mind I'm going to announce the release of Livy
0.4.0-incubating in the incubation mail list.

Thanks
Jerry



On Sat, Sep 2, 2017 at 2:11 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> The website PRs are updated (with a 9/1/17 release date) and ready to
> merge.
>
> @jerry if you want to merge them and update the website when you send the
> announcement email.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Saisai Shao ---09/01/2017 01:11:03
> AM---Hi all, I just published maven artifacts to repository.apache]Saisai
> Shao ---09/01/2017 01:11:03 AM---Hi all, I just published maven artifacts
> to repository.apache.org, please check (https://urldefense.
>
> From: Saisai Shao <sai.sai.s...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 09/01/2017 01:11 AM
> Subject: Re: The process of Livy 0.4.0-incubating release
> --
>
>
>
> Hi all, I just published maven artifacts to repository.apache.org, please
> check (https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__repository.apache.org_-23nexus-2Dsearch-3Bquick-
> 7Eorg.apache.livy=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=fMV22rxtZ56xBOsS_
> 08nAtWN0INQGs6rwMRoj1ot8js=PDM5YGd_nH5qlIketLl_
> Igr0Fhdqyf9IsI1QIwwEq4M= ).
>
> I think once related doc is updated, all the release work should be done.
>
> Thanks
> Jerry
>
>
>
> On Thu, Aug 31, 2017 at 9:01 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:
>
> > I agree with Jeff on the announcement timing, the links on the website PR
> > are already updated and working, I'll just have to push an update to the
> > release date once we know when we'll announce the release. And you can
> > delete the gh-pages branch, it's been moved to the old-site branch on the
> > website repo.
> >
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth* <https://urldefense.
> proofpoint.com/v2/url?u=https-3A__github.com_ajbozarth=
> DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-
> cx37DPYDyo=fMV22rxtZ56xBOsS_08nAtWN0INQGs6rwMRoj1ot8js=
> CzCDePcP9HxXc4FCCima2ThJ0zgF6K5oHcRKWJEegMg= >
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > [image: Inactive hide details for Jeff Zhang ---08/30/2017 05:53:15
> PM---I
> > think we'd better to announce after the artifacts are publis]Jeff Zhang
> > ---08/30/2017 05:53:15 PM---I think we'd better to announce after the
> > artifacts are published. Saisai Shao <sai.sai.shao@gmail.c
> >
> > From: Jeff Zhang <zjf...@gmail.com>
> > To: dev@livy.incubator.apache.org
> > Date: 08/30/2017 05:53 PM
> > Subject: Re: The process of Livy 0.4.0-incubating release
> > --
>
> >
> >
> >
> > I think we'd better to announce after the artifacts are published.
> >
> > Saisai Shao <sai.sai.s...@gmail.com>于2017年8月31日周四 上午8:35写道:
> >
> > > Hi Alex,
> > >
> > > I think you can update the website PR firstly to link to the correct
> > > download URL (since the package is already available) and doc.
> > >
> > > I'm working on publish jars to nexus repository, currently I'm waiting
> > for
> > > infra team to create a Livy nexus profile, so I push push jars to
> staging
> > > repo.
> > >
> > > Is it OK to announce now? Since we haven't yet pushed jars, technically
> > the
> > > release process is not finished.
> > >
> > > I will cleanup some unnecessary branches and tags, one thing is that is
> > it
> > > OK to remove gh-pages branch?
> > >
> > > Thanks
> > > Jerry
> > >
> > >
> > >
> > > On Thu, Aug 31, 2017 at 2:59 AM, Alex Bozarth <ajboz...@us.ibm.com>
> > wrote:
> > >
> > > > So is the release ready to announce then? The apache bin/src download
> > > > links are live so all I need for the website update is an o

Re: Port to open for submitting Spark on Yarn application

2017-09-03 Thread Saisai Shao
I think spark.yarn.am.port is not used any more, so you don't need to
consider this.

If you're running Spark on YARN, I think some YARN RM port to submit
applications should also be reachable via firewall, as well as HDFS port to
upload resources.

Also in the Spark side, executors will be connected to driver via
spark.driver.port, maybe you should also set a fixed port number for this
and add to white list of firewall.

Thanks
Jerry


On Mon, Sep 4, 2017 at 8:50 AM, Satoshi Yamada  wrote:

> Hi,
>
> In case we run Spark on Yarn in client mode, we have firewall for Hadoop 
> cluster,
> and the client node is outside firewall, I think I have to open some ports
> that Application Master uses.
>
>
> I think the ports is specified by "spark.yarn.am.port" as document says.
> https://spark.apache.org/docs/latest/running-on-yarn.html
>
> But, according to the source code, spark.yarn.am.port is deprecated since 2.0.
> https://github.com/apache/spark/commit/829cd7b8b70e65a91aa66e6d626bd45f18e0ad97
>
> Does this mean we do not need to open particular ports of firewall for
>
> Spark on Yarn?
>
>
> Thanks,
>
>


[jira] [Commented] (SPARK-21888) Cannot add stuff to Client Classpath for Yarn Cluster Mode

2017-09-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150258#comment-16150258
 ] 

Saisai Shao commented on SPARK-21888:
-

Jars added by "--jars" will be added to client classpath in yarn-cluster mode. 

In your case the only problem is about hbase-site.xml, normally we will put 
this file in SPARK_CONF_DIR as well as hive-site.xml, doesn't it work for your?

> Cannot add stuff to Client Classpath for Yarn Cluster Mode
> --
>
> Key: SPARK-21888
> URL: https://issues.apache.org/jira/browse/SPARK-21888
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Parth Gandhi
>Priority: Minor
>
> While running Spark on Yarn in cluster mode, currently there is no way to add 
> any config files, jars etc. to Client classpath. An example for this is that 
> suppose you want to run an application that uses hbase. Then, unless and 
> until we do not copy the necessary config files required by hbase to Spark 
> Config folder, we cannot specify or set their exact locations in classpath on 
> Client end which we could do so earlier by setting the environment variable 
> "SPARK_CLASSPATH".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21882) OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

2017-09-01 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150260#comment-16150260
 ] 

Saisai Shao commented on SPARK-21882:
-

Please submit the patch to Github Apache Spark repo.

> OutputMetrics doesn't count written bytes correctly in the 
> saveAsHadoopDataset function
> ---
>
> Key: SPARK-21882
> URL: https://issues.apache.org/jira/browse/SPARK-21882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.2.0
>Reporter: linxiaojun
>Priority: Minor
> Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does 
> not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 
> 0). The reason is that we did not initialize the callback function called to 
> find bytes written in the right way. As usual, statisticsTable which records 
> statistics in a FileSystem must be initialized at the beginning (this will be 
> triggered when open SparkHadoopWriter). The solution for this issue is to 
> adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: The process of Livy 0.4.0-incubating release

2017-09-01 Thread Saisai Shao
Hi all, I just published maven artifacts to repository.apache.org, please
check (https://repository.apache.org/#nexus-search;quick~org.apache.livy).

I think once related doc is updated, all the release work should be done.

Thanks
Jerry



On Thu, Aug 31, 2017 at 9:01 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> I agree with Jeff on the announcement timing, the links on the website PR
> are already updated and working, I'll just have to push an update to the
> release date once we know when we'll announce the release. And you can
> delete the gh-pages branch, it's been moved to the old-site branch on the
> website repo.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Jeff Zhang ---08/30/2017 05:53:15 PM---I
> think we'd better to announce after the artifacts are publis]Jeff Zhang
> ---08/30/2017 05:53:15 PM---I think we'd better to announce after the
> artifacts are published. Saisai Shao <sai.sai.shao@gmail.c
>
> From: Jeff Zhang <zjf...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 08/30/2017 05:53 PM
> Subject: Re: The process of Livy 0.4.0-incubating release
> --
>
>
>
> I think we'd better to announce after the artifacts are published.
>
> Saisai Shao <sai.sai.s...@gmail.com>于2017年8月31日周四 上午8:35写道:
>
> > Hi Alex,
> >
> > I think you can update the website PR firstly to link to the correct
> > download URL (since the package is already available) and doc.
> >
> > I'm working on publish jars to nexus repository, currently I'm waiting
> for
> > infra team to create a Livy nexus profile, so I push push jars to staging
> > repo.
> >
> > Is it OK to announce now? Since we haven't yet pushed jars, technically
> the
> > release process is not finished.
> >
> > I will cleanup some unnecessary branches and tags, one thing is that is
> it
> > OK to remove gh-pages branch?
> >
> > Thanks
> > Jerry
> >
> >
> >
> > On Thu, Aug 31, 2017 at 2:59 AM, Alex Bozarth <ajboz...@us.ibm.com>
> wrote:
> >
> > > So is the release ready to announce then? The apache bin/src download
> > > links are live so all I need for the website update is an official
> > release
> > > date, would that be today or the day we announce to the user list? On a
> > > related note, should we clean up our branches and tags on the git repo?
> > It
> > > makes sense to remove the rc tags, but should we also get rid of the
> > extra
> > > branches other than "branch-0.x"
> > >
> > >
> > > *Alex Bozarth*
> > > Software Engineer
> > > Spark Technology Center
> > > --
> > > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > > *GitHub: **github.com/ajbozarth* <https://urldefense.
> proofpoint.com/v2/url?u=https-3A__github.com_ajbozarth=
> DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-
> cx37DPYDyo=zDjACJJqa08AnZmjSeGhW07H5m36H1Jm-wbbwP_BFX8=
> xlygKgJEYxQqGARDhM7RSr8ZPQEW7fo1xuhNihFXR68= >
>
> > >
> > >
> > > 505 Howard Street
> > > San Francisco, CA 94105
> > > United States
> > >
> > >
> > >
> > > [image: Inactive hide details for Saisai Shao ---08/30/2017 12:05:24
> > > AM---Looks like I wronged the releasing process, I should
> publish]Saisai
> > > Shao ---08/30/2017 12:05:24 AM---Looks like I wronged the releasing
> > > process, I should publish jars to staging repo according with RC
> > >
> > > From: Saisai Shao <sai.sai.s...@gmail.com>
> > > To: dev@livy.incubator.apache.org
> > > Date: 08/30/2017 12:05 AM
> > > Subject: Re: The process of Livy 0.4.0-incubating release
> > > --
> > >
> > >
> > >
> > > Looks like I wronged the releasing process, I should publish jars to
> > > staging repo according with RC vote, and after vote passed click
> release
> > to
> > > publish jars. But what I currently do is to publish jars to staging
> repo
> > > after vote passed. I'm really sorry about not familiar with the
> process.
> > I
> > > will write a doc as well as release script for the next release.
> > >
> > > Sorry

[jira] [Assigned] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2017-08-31 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-21658:
---

Assignee: Chin Han Yu

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2017-08-31 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150059#comment-16150059
 ] 

Saisai Shao commented on SPARK-21658:
-

Done :).

> Adds the default None for value in na.replace in PySpark to match
> -
>
> Key: SPARK-21658
> URL: https://issues.apache.org/jira/browse/SPARK-21658
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Chin Han Yu
>Priority: Minor
>  Labels: Starter
> Fix For: 2.3.0
>
>
> Looks {{na.replace}} missed the default value {{None}}.
> Both docs says they are aliases 
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
> http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
> but the default values looks different, which ends up with:
> {code}
> >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
> >>> df.replace({"Alice": "a"}).first()
> Row(_1=u'a', _2=10, _3=80.0)
> >>> df.na.replace({"Alice": "a"}).first()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: replace() takes at least 3 arguments (2 given)
> {code}
> To take the advantage of SPARK-19454, sounds we should match them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Moving Scala 2.12 forward one step

2017-08-31 Thread Saisai Shao
Hi Sean,

Do we have a planned target version for Scala 2.12 support? Several other
projects like Zeppelin, Livy which rely on Spark repl also require changes
to support this Scala 2.12.

Thanks
Jerry

On Thu, Aug 31, 2017 at 5:55 PM, Sean Owen  wrote:

> No, this doesn't let Spark build and run on 2.12. It makes changes that
> will be required though, the ones that are really no loss to the current
> 2.11 build.
>
> On Thu, Aug 31, 2017, 10:48 Denis Bolshakov 
> wrote:
>
>> Hello,
>>
>> Sounds amazing. Is there any improvements in benchmarks?
>>
>>
>> On 31 August 2017 at 12:25, Sean Owen  wrote:
>>
>>> Calling attention to the question of Scala 2.12 again for moment. I'd
>>> like to make a modest step towards support. Have a look again, if you
>>> would, at SPARK-14280:
>>>
>>> https://github.com/apache/spark/pull/18645
>>>
>>> This is a lot of the change for 2.12 that doesn't break 2.11, and really
>>> doesn't add any complexity. It's mostly dependency updates and clarifying
>>> some code. Other items like dealing with Kafka 0.8 support, the 2.12 REPL,
>>> etc, are not  here.
>>>
>>> So, this still doesn't result in a working 2.12 build but it's most of
>>> the miscellany that will be required.
>>>
>>> I'd like to merge it but wanted to flag it for feedback as it's not
>>> trivial.
>>>
>>
>>
>>
>> --
>> //with Best Regards
>> --Denis Bolshakov
>> e-mail: bolshakov.de...@gmail.com
>>
>


[jira] [Commented] (SPARK-11574) Spark should support StatsD sink out of box

2017-08-31 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148713#comment-16148713
 ] 

Saisai Shao commented on SPARK-11574:
-

Thanks a lot [~srowen]!

> Spark should support StatsD sink out of box
> ---
>
> Key: SPARK-11574
> URL: https://issues.apache.org/jira/browse/SPARK-11574
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Xiaofeng Lin
>Assignee: Xiaofeng Lin
> Fix For: 2.3.0
>
>
> In order to run spark in production, monitoring is essential. StatsD is such 
> a common metric reporting mechanism that it should be supported out of the 
> box.  This will enable publishing metrics to monitoring services like 
> datadog, etc. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11574) Spark should support StatsD sink out of box

2017-08-31 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148701#comment-16148701
 ] 

Saisai Shao commented on SPARK-11574:
-

Ping [~srowen]. Hi Sean I cannot assign this JIRA to Xiaofeng, since I cannot 
find his name in the prompt list, would you please help to check it, if 
possible can you please assign the JIRA to him, I already merged it in Github, 
thanks a lot!

> Spark should support StatsD sink out of box
> ---
>
> Key: SPARK-11574
> URL: https://issues.apache.org/jira/browse/SPARK-11574
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Xiaofeng Lin
> Fix For: 2.3.0
>
>
> In order to run spark in production, monitoring is essential. StatsD is such 
> a common metric reporting mechanism that it should be supported out of the 
> box.  This will enable publishing metrics to monitoring services like 
> datadog, etc. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11574) Spark should support StatsD sink out of box

2017-08-30 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148333#comment-16148333
 ] 

Saisai Shao commented on SPARK-11574:
-

Maybe it is my permission issue, will ask PMC to handle it.

> Spark should support StatsD sink out of box
> ---
>
> Key: SPARK-11574
> URL: https://issues.apache.org/jira/browse/SPARK-11574
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Xiaofeng Lin
> Fix For: 2.3.0
>
>
> In order to run spark in production, monitoring is essential. StatsD is such 
> a common metric reporting mechanism that it should be supported out of the 
> box.  This will enable publishing metrics to monitoring services like 
> datadog, etc. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2017-08-30 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-17321:
---

Assignee: Saisai Shao

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0, 2.1.1
>Reporter: yunjiong zhao
>Assignee: Saisai Shao
> Fix For: 2.3.0
>
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2017-08-30 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-17321.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19032
[https://github.com/apache/spark/pull/19032]

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0, 2.1.1
>Reporter: yunjiong zhao
> Fix For: 2.3.0
>
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11574) Spark should support StatsD sink out of box

2017-08-30 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148292#comment-16148292
 ] 

Saisai Shao commented on SPARK-11574:
-

Hi Xiaofeng, is your JIRA username still available, I cannot assign the JIRA to 
you, since I cannot find your name.

[~srowen] [~hyukjin.kwon] do you know how to handle this situation?

> Spark should support StatsD sink out of box
> ---
>
> Key: SPARK-11574
> URL: https://issues.apache.org/jira/browse/SPARK-11574
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Xiaofeng Lin
> Fix For: 2.3.0
>
>
> In order to run spark in production, monitoring is essential. StatsD is such 
> a common metric reporting mechanism that it should be supported out of the 
> box.  This will enable publishing metrics to monitoring services like 
> datadog, etc. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11574) Spark should support StatsD sink out of box

2017-08-30 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-11574.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 9518
[https://github.com/apache/spark/pull/9518]

> Spark should support StatsD sink out of box
> ---
>
> Key: SPARK-11574
> URL: https://issues.apache.org/jira/browse/SPARK-11574
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Xiaofeng Lin
> Fix For: 2.3.0
>
>
> In order to run spark in production, monitoring is essential. StatsD is such 
> a common metric reporting mechanism that it should be supported out of the 
> box.  This will enable publishing metrics to monitoring services like 
> datadog, etc. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: The process of Livy 0.4.0-incubating release

2017-08-30 Thread Saisai Shao
Hi Alex,

I think you can update the website PR firstly to link to the correct
download URL (since the package is already available) and doc.

I'm working on publish jars to nexus repository, currently I'm waiting for
infra team to create a Livy nexus profile, so I push push jars to staging
repo.

Is it OK to announce now? Since we haven't yet pushed jars, technically the
release process is not finished.

I will cleanup some unnecessary branches and tags, one thing is that is it
OK to remove gh-pages branch?

Thanks
Jerry



On Thu, Aug 31, 2017 at 2:59 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> So is the release ready to announce then? The apache bin/src download
> links are live so all I need for the website update is an official release
> date, would that be today or the day we announce to the user list? On a
> related note, should we clean up our branches and tags on the git repo? It
> makes sense to remove the rc tags, but should we also get rid of the extra
> branches other than "branch-0.x"
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Saisai Shao ---08/30/2017 12:05:24
> AM---Looks like I wronged the releasing process, I should publish]Saisai
> Shao ---08/30/2017 12:05:24 AM---Looks like I wronged the releasing
> process, I should publish jars to staging repo according with RC
>
> From: Saisai Shao <sai.sai.s...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 08/30/2017 12:05 AM
> Subject: Re: The process of Livy 0.4.0-incubating release
> --
>
>
>
> Looks like I wronged the releasing process, I should publish jars to
> staging repo according with RC vote, and after vote passed click release to
> publish jars. But what I currently do is to publish jars to staging repo
> after vote passed. I'm really sorry about not familiar with the process. I
> will write a doc as well as release script for the next release.
>
> Sorry about it.
>
> Best regards,
> Jerry
>
> On Wed, Aug 30, 2017 at 2:10 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
> > Hi Luciano,
> >
> > I'm trying to push maven artifacts to maven staging repository at
> > repository.apache.org, seems we need a org.apache.livy staging profile
> to
> > use to create a staging repo, I checked Spark release script, it has a
> > profile "d63f592e7eac0" used to create repo, do we need this profile for
> > Livy? Also how can I create this profile?
> >
> > Thanks
> > Jerry
> >
> >
> > On Mon, Aug 28, 2017 at 3:39 PM, Luciano Resende <luckbr1...@gmail.com>
> > wrote:
> >
> >> On Sun, Aug 27, 2017 at 11:53 PM, Saisai Shao <sai.sai.s...@gmail.com>
> >> wrote:
> >>
> >> > Hi mentors,
> >> >
> >> > Would you please guide us on how to release the Livy 0.4.0-incubating,
> >> > including artifact publish.
> >> >
> >>
> >> The artifacts move from :
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.
> apache.org_repos_dist_dev_incubator_livy_=DwIBaQ=jf_
> iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=
> 73AoTw0B3DG5OXNjZxX9MzTFe42Dx59uljW55vuT2R0=
> 7tRxDlukXRLudKDtlpeeok3lzvCDHHYw-7oMx8fwz-I=
> >>
> >> to
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.
> apache.org_repos_dist_release_incubator_livy_=DwIBaQ=jf_
> iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=
> 73AoTw0B3DG5OXNjZxX9MzTFe42Dx59uljW55vuT2R0=
> fKrHDzEReoqNRDrYzvoBKQ0hN7wHdFsSgJ0ka3vNDJk=
> >>
> >>
> >> >
> >> > Also regarding push artifacts to repo, are we inclining to change to
> >> Maven
> >> > central repo, or we still use Cloudera repo, any suggestion?
> >> >
> >> >
> >> You should have a maven staging repository at repository.apache.org,
> >> after
> >> a release is approved, that repository should just be released.
> >>
> >>
> >> > Besides, for Python package release, do we need to publish this python
> >> > client package to PIP or we don't need to do this now, any comment?
> >> >
> >> >
> >> +0, I would say do what you have been doing in the past
> >>
> >>
> >> > Thanks for your help and suggestion.
> >> >
> >> > Best regards,
> >> > Saisai (Jerry)
> >> >
> >>
> >>
> >>
> >> --
> >> Luciano Resende
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.
> com_lresende1975=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=73AoTw0B3DG5OXNjZxX9MzTFe42Dx5
> 9uljW55vuT2R0=ooMsKS7OJms1O05-O2L786_xRXZLHYTvQ1aVPDEy_9s=
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__
> lresende.blogspot.com_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=73AoTw0B3DG5OXNjZxX9MzTFe42Dx5
> 9uljW55vuT2R0=IWI3eFgaDV6GPzGVufAJAQioCPr7ydFrZhfYZEKPhvo=
> >>
> >
> >
>
>
>
>


<    3   4   5   6   7   8   9   10   11   12   >