Re: The process of Livy 0.4.0-incubating release

2017-08-30 Thread Saisai Shao
Looks like I wronged the releasing process, I should publish jars to
staging repo according with RC vote, and after vote passed click release to
publish jars. But what I currently do is to publish jars to staging repo
after vote passed. I'm really sorry about not familiar with the process. I
will write a doc as well as release script for the next release.

Sorry about it.

Best regards,
Jerry

On Wed, Aug 30, 2017 at 2:10 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> Hi Luciano,
>
> I'm trying to push maven artifacts to maven staging repository at
> repository.apache.org, seems we need a org.apache.livy staging profile to
> use to create a staging repo, I checked Spark release script, it has a
> profile "d63f592e7eac0" used to create repo, do we need this profile for
> Livy? Also how can I create this profile?
>
> Thanks
> Jerry
>
>
> On Mon, Aug 28, 2017 at 3:39 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
>> On Sun, Aug 27, 2017 at 11:53 PM, Saisai Shao <sai.sai.s...@gmail.com>
>> wrote:
>>
>> > Hi mentors,
>> >
>> > Would you please guide us on how to release the Livy 0.4.0-incubating,
>> > including artifact publish.
>> >
>>
>> The artifacts move from :
>> https://dist.apache.org/repos/dist/dev/incubator/livy/
>>
>> to
>> https://dist.apache.org/repos/dist/release/incubator/livy/
>>
>>
>> >
>> > Also regarding push artifacts to repo, are we inclining to change to
>> Maven
>> > central repo, or we still use Cloudera repo, any suggestion?
>> >
>> >
>> You should have a maven staging repository at repository.apache.org,
>> after
>> a release is approved, that repository should just be released.
>>
>>
>> > Besides, for Python package release, do we need to publish this python
>> > client package to PIP or we don't need to do this now, any comment?
>> >
>> >
>> +0, I would say do what you have been doing in the past
>>
>>
>> > Thanks for your help and suggestion.
>> >
>> > Best regards,
>> > Saisai (Jerry)
>> >
>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>


Re: The process of Livy 0.4.0-incubating release

2017-08-30 Thread Saisai Shao
Hi Luciano,

I'm trying to push maven artifacts to maven staging repository at
repository.apache.org, seems we need a org.apache.livy staging profile to
use to create a staging repo, I checked Spark release script, it has a
profile "d63f592e7eac0" used to create repo, do we need this profile for
Livy? Also how can I create this profile?

Thanks
Jerry


On Mon, Aug 28, 2017 at 3:39 PM, Luciano Resende <luckbr1...@gmail.com>
wrote:

> On Sun, Aug 27, 2017 at 11:53 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
> > Hi mentors,
> >
> > Would you please guide us on how to release the Livy 0.4.0-incubating,
> > including artifact publish.
> >
>
> The artifacts move from :
> https://dist.apache.org/repos/dist/dev/incubator/livy/
>
> to
> https://dist.apache.org/repos/dist/release/incubator/livy/
>
>
> >
> > Also regarding push artifacts to repo, are we inclining to change to
> Maven
> > central repo, or we still use Cloudera repo, any suggestion?
> >
> >
> You should have a maven staging repository at repository.apache.org, after
> a release is approved, that repository should just be released.
>
>
> > Besides, for Python package release, do we need to publish this python
> > client package to PIP or we don't need to do this now, any comment?
> >
> >
> +0, I would say do what you have been doing in the past
>
>
> > Thanks for your help and suggestion.
> >
> > Best regards,
> > Saisai (Jerry)
> >
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


[jira] [Commented] (SPARK-21829) Enable config to permanently blacklist a list of nodes

2017-08-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144889#comment-16144889
 ] 

Saisai Shao commented on SPARK-21829:
-

I understand you concern, since it is much easier for you to change Spark 
rather than YARN which is shared across the teams. But my thinking is more from 
the feature itself, looks like such kind of thing should be handled by cluster 
manager, not Spark itself. At least it should be done by {{yarn#client}} and 
{{ApplicationMaster}}, not blacklist.

I think [~irashid] and [~tgraves] may have other thoughts.

> Enable config to permanently blacklist a list of nodes
> --
>
> Key: SPARK-21829
> URL: https://issues.apache.org/jira/browse/SPARK-21829
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> The idea for this proposal comes from a performance incident in a local 
> cluster where a job was found very slow because of a log tail of stragglers 
> due to 2 nodes in the cluster being slow to access a remote filesystem.
> The issue was limited to the 2 machines and was related to external 
> configurations: the 2 machines that performed badly when accessing the remote 
> file system were behaving normally for other jobs in the cluster (a shared 
> YARN cluster).
> With this new feature I propose to introduce a mechanism to allow users to 
> specify a list of nodes in the cluster where executors/tasks should not run 
> for a specific job.
> The proposed implementation that I tested (see PR) uses the Spark blacklist 
> mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list 
> of user-specified nodes is added to the blacklist at the start of the Spark 
> Context and it is never expired. 
> I have tested this on a YARN cluster on a case taken from the original 
> production problem and I confirm a performance improvement of about 5x for 
> the specific test case I have. I imagine that there can be other cases where 
> Spark users may want to blacklist a set of nodes. This can be used for 
> troubleshooting, including cases where certain nodes/executors are slow for a 
> given workload and this is caused by external agents, so the anomaly is not 
> picked up by the cluster manager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Update ASF GitHub Bot Settings?

2017-08-28 Thread Saisai Shao
I saw Zeppelin also mirror everything from github to JIRA, if INFRA is hard
to change to what you expected, I'm fine with current way. Since Livy is
not so active as Spark, we may not suffer from reviewing pain.

On Tue, Aug 29, 2017 at 10:47 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> Spark does use a script to do it, but that script requires a special admin
> JIRA account and is run via a cron job as part of the pr dashboard website
> by Databricks (https://spark-prs.appspot.com). The website source code is
> open source and I've actually done quick a bit of work with it, but it
> requires a paid Google CloudPlat account to run in production. Overall I
> like how Spark does it but it's not as simple to reproduce as it seems. If
> someone want to pay for a Google CloudPlat account and set up a ASF JIRA
> Bot like Spark's then I can easily set up a LIVY PR Dash that will run the
> JIRA update cron job.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Saisai Shao ---08/28/2017 06:53:28
> PM---Hi Alex, Do you want to achieve what Spark did in the JIRA? A]Saisai
> Shao ---08/28/2017 06:53:28 PM---Hi Alex, Do you want to achieve what Spark
> did in the JIRA? AFAIK Spark uses a
>
> From: Saisai Shao <sai.sai.s...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 08/28/2017 06:53 PM
> Subject: Re: Update ASF GitHub Bot Settings?
> --
>
>
>
> Hi Alex,
>
> Do you want to achieve what Spark did in the JIRA? AFAIK Spark uses a
> script to sync between github and jira, it doesn't enable gihub robot to
> mirror everything from github.
>
> Thanks
> Jerry
>
> On Tue, Aug 29, 2017 at 5:00 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:
>
> > Opened an INFRA JIRA https://urldefense.proofpoint.
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-
> 2D14973=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=HecI9o5Yex3-t-
> KdkHKD8f-oItRwZk_z7jkZ9L_Rp9s=aKdBIISGMsMO4cfi6VP9M2s4SNcUCq
> 2kpHytTTTjekg=
> >
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth* <https://urldefense.
> proofpoint.com/v2/url?u=https-3A__github.com_ajbozarth=
> DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-
> cx37DPYDyo=HecI9o5Yex3-t-KdkHKD8f-oItRwZk_z7jkZ9L_Rp9s=
> l0AeGk6t8q38vabAkHUe5zM4IkEfHujrrL2FROo2WbM= >
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > [image: Inactive hide details for "Alex Bozarth" ---08/28/2017 12:08:36
> > PM---Is there a place where it describes the default and/or the]"Alex
> > Bozarth" ---08/28/2017 12:08:36 PM---Is there a place where it describes
> > the default and/or the options that I could use as reference whe
> >
> > From: "Alex Bozarth" <ajboz...@us.ibm.com>
> > To: dev@livy.incubator.apache.org
> > Date: 08/28/2017 12:08 PM
> > Subject: Re: Update ASF GitHub Bot Settings?
> > --
> >
> >
> >
> > Is there a place where it describes the default and/or the options that I
> > could use as reference when filing the JIRA?
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth*
> > <https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_ajbozarth=DwMFAg=jf_iaSHvJObTbx-siA1ZOg=S1_
> S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=flEt_
> Q08XzmKdKaGAHO3RTe5JvMKgEVjZKBtN1_TGiA=Ul0Q03YqdRvD21qPYqtc4jejt08ThO
> zUqaKEiaJlgjE=>
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > Luciano Resende ---08/28/2017 12:06:37 PM---Please file an INFRA jira
> > describing you want to change the GitHub workflow. On Mon, Aug 28, 2017
> at
> >
> > From: Luciano Resende <luckbr1...@gmail.com>
> > To: dev@livy.incubator.apache.org
> > Date: 08/28/2017 12:06 PM
> > Subject: Re: Update ASF GitHub Bot Settings?
> > --
> >
>

Re: Update ASF GitHub Bot Settings?

2017-08-28 Thread Saisai Shao
Hi Alex,

Do you want to achieve what Spark did in the JIRA? AFAIK Spark uses a
script to sync between github and jira, it doesn't enable gihub robot to
mirror everything from github.

Thanks
Jerry

On Tue, Aug 29, 2017 at 5:00 AM, Alex Bozarth  wrote:

> Opened an INFRA JIRA https://issues.apache.org/jira/browse/INFRA-14973
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* 
> *GitHub: **github.com/ajbozarth* 
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for "Alex Bozarth" ---08/28/2017 12:08:36
> PM---Is there a place where it describes the default and/or the]"Alex
> Bozarth" ---08/28/2017 12:08:36 PM---Is there a place where it describes
> the default and/or the options that I could use as reference whe
>
> From: "Alex Bozarth" 
> To: dev@livy.incubator.apache.org
> Date: 08/28/2017 12:08 PM
> Subject: Re: Update ASF GitHub Bot Settings?
> --
>
>
>
> Is there a place where it describes the default and/or the options that I
> could use as reference when filing the JIRA?
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* 
> *GitHub: **github.com/ajbozarth*
> 
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> Luciano Resende ---08/28/2017 12:06:37 PM---Please file an INFRA jira
> describing you want to change the GitHub workflow. On Mon, Aug 28, 2017 at
>
> From: Luciano Resende 
> To: dev@livy.incubator.apache.org
> Date: 08/28/2017 12:06 PM
> Subject: Re: Update ASF GitHub Bot Settings?
> --
>
>
>
> Please file an INFRA jira describing you want to change the GitHub
> workflow.
>
> On Mon, Aug 28, 2017 at 11:22 AM, Alex Bozarth 
> wrote:
>
> >
> >
> > Is there a way to edit the default ASF GitHub Bot settings? I think it's
> > great that it auto-comments on our JIRAs with links to new PRs, but it's
> > also auto-commenting on the JIRAs for every single comment on those PRs,
> > which clutters up both the JIRA and the Activity Stream (which is how I
> > personally keep up on the Livy JIRAs.
> >
> >
> >  Alex Bozarth
> >  Software Engineer
> >  Spark Technology Center
> >
> >
> >
> >
> >  E-mail: ajboz...@us.ibm.com
> >  GitHub: github.com/ajbozarth
> >505
> > Howard Street
> >  San
> > Francisco, CA 94105
> >
> >  United States
> >
> >
> >
> >
> >
> >
>
>
> --
> Luciano Resende
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_lresende1975=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=K1uiNBsM4Wi-52dZ-0QOOhg_AUEd4fcpd9FOpKarXcQ=Hs8Np4fmjmGtSQm-ie9fPs579ISVlLeGjDUtLqyCaI8=*
> 
>
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__lresende.blogspot.com_=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=S1_S7Dymu4ZL6g7L21O78VQZ53vEnAyZ-cx37DPYDyo=K1uiNBsM4Wi-52dZ-0QOOhg_AUEd4fcpd9FOpKarXcQ=_-mxqeV83wkvel4Y367vbHlPkKqkEM_xY6R1KfyysoE=*
> 
>
>
>
>
>
>
>


The process of Livy 0.4.0-incubating release

2017-08-28 Thread Saisai Shao
Hi mentors,

Would you please guide us on how to release the Livy 0.4.0-incubating,
including artifact publish.

Also regarding push artifacts to repo, are we inclining to change to Maven
central repo, or we still use Cloudera repo, any suggestion?

Besides, for Python package release, do we need to publish this python
client package to PIP or we don't need to do this now, any comment?

Thanks for your help and suggestion.

Best regards,
Saisai (Jerry)


[RESULT][VOTE] Apache Livy (incubating) 0.4.0 release RC2

2017-08-28 Thread Saisai Shao
Hi All,

The vote for releasing Apache Livy 0.4.0-incubating passed with 4 binding
+1s, 3 non-binding +1s, and no 0 or -1.

Binding +1s:

Bikas Saha
Jean-Baptiste Onofré
Brock Noland
Luciano Resende

The votes were (
https://www.mail-archive.com/general@incubator.apache.org/msg60990.html)

Thanks to everyone for taking the time to review and vote. We will now proceed
with the release.

Best regards,
Jerry


[jira] [Commented] (SPARK-21829) Enable config to permanently blacklist a list of nodes

2017-08-24 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141081#comment-16141081
 ] 

Saisai Shao commented on SPARK-21829:
-

Cross post the comment here. Since you're running Spark on YARN, so I think 
node label should well address your scenario.

The changes you made in BlacklistTracker seems break the design purpose of 
backlist. The blacklist in Spark as well as in MR/TEZ assumes bad 
nodes/executors will be back to normal in several hours, so it always has a 
timeout for blacklist.

In your case, the problem is not bad nodes/executors, it is that you don't what 
to start executors on some nodes (like slow nodes). This is more like a cluster 
manager problem rather than Spark problem. To summarize your problem, you want 
your Spark application runs on some specific nodes.

To solve your problem, for YARN you could use node label and Spark on YARN 
already support node label. You could google node label to know the details.

For standalone, simply you should not start worker on such nodes you don't want.

For Mesos I'm not sure, I guess it should also has similar approaches.

> Enable config to permanently blacklist a list of nodes
> --
>
> Key: SPARK-21829
> URL: https://issues.apache.org/jira/browse/SPARK-21829
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> The idea for this proposal comes from a performance incident in a local 
> cluster where a job was found very slow because of a log tail of stragglers 
> due to 2 nodes in the cluster being slow to access a remote filesystem.
> The issue was limited to the 2 machines and was related to external 
> configurations: the 2 machines that performed badly when accessing the remote 
> file system were behaving normally for other jobs in the cluster (a shared 
> YARN cluster).
> With this new feature I propose to introduce a mechanism to allow users to 
> specify a list of nodes in the cluster where executors/tasks should not run 
> for a specific job.
> The proposed implementation that I tested (see PR) uses the Spark blacklist 
> mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list 
> of user-specified nodes is added to the blacklist at the start of the Spark 
> Context and it is never expired. 
> I have tested this on a YARN cluster on a case taken from the original 
> production problem and I confirm a performance improvement of about 5x for 
> the specific test case I have. I imagine that there can be other cases where 
> Spark users may want to blacklist a set of nodes. This can be used for 
> troubleshooting, including cases where certain nodes/executors are slow for a 
> given workload and this is caused by external agents, so the anomaly is not 
> picked up by the cluster manager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-21819) UserGroupInformation initialization in SparkHadoopUtilwill overwrite user config

2017-08-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao closed SPARK-21819.
---
Resolution: Not A Problem

>  UserGroupInformation initialization in SparkHadoopUtilwill overwrite user 
> config
> -
>
> Key: SPARK-21819
> URL: https://issues.apache.org/jira/browse/SPARK-21819
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, YARN
>Affects Versions: 2.1.0, 2.1.1
> Environment: Ubuntu14.04
> Spark2.10/2.11 (I checked the github of 2.20 , it exist there as well)
> Cluster mode: Yarn client 
>Reporter: Keith Sun
> Attachments: yarnsparkutil.jpg
>
>
> When  submit job in Java or Scala code to ,the initialization of 
> SparkHadoopUtil will trigger the configuration overwritten in UGI which may 
> not be expected if the UGI has already been initialized by customized xmls 
> which are not on the classpath (like the cfg4j , which could set conf from 
> github code, a database etc). 
> {code:java}
> //it will overwrite the UGI conf which has already been initialized
> class SparkHadoopUtil extends Logging {
>   private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
>   val conf: Configuration = newConfiguration(sparkConf)
>   UserGroupInformation.setConfiguration(conf)
> {code}
> My scenario : My yarn cluster is kerberized, my configuration is set to use 
> kerberos for hadoop security. While, after the initialzation of 
> SparkHadoopUtil , the authentiationMethod in UGI is updated to "simple"(my 
> xmls not on the classpath), which lead to the failure like below :
> {code:java}
> 933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
> SparkContext
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>   at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
>   at org.apache.spark.SparkContext.(SparkContext.scala:497)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   at SparkTest.SparkEAZDebug.main(SparkEAZDebug.java:84)
> Caused by: 
> org

[jira] [Commented] (SPARK-21819) UserGroupInformation initialization in SparkHadoopUtilwill overwrite user config

2017-08-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139590#comment-16139590
 ] 

Saisai Shao commented on SPARK-21819:
-

Then I think there should no issue in Spark, right? [~KSLaskfla].

>  UserGroupInformation initialization in SparkHadoopUtilwill overwrite user 
> config
> -
>
> Key: SPARK-21819
> URL: https://issues.apache.org/jira/browse/SPARK-21819
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, YARN
>Affects Versions: 2.1.0, 2.1.1
> Environment: Ubuntu14.04
> Spark2.10/2.11 (I checked the github of 2.20 , it exist there as well)
> Cluster mode: Yarn client 
>Reporter: Keith Sun
> Attachments: yarnsparkutil.jpg
>
>
> When  submit job in Java or Scala code to ,the initialization of 
> SparkHadoopUtil will trigger the configuration overwritten in UGI which may 
> not be expected if the UGI has already been initialized by customized xmls 
> which are not on the classpath (like the cfg4j , which could set conf from 
> github code, a database etc). 
> {code:java}
> //it will overwrite the UGI conf which has already been initialized
> class SparkHadoopUtil extends Logging {
>   private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
>   val conf: Configuration = newConfiguration(sparkConf)
>   UserGroupInformation.setConfiguration(conf)
> {code}
> My scenario : My yarn cluster is kerberized, my configuration is set to use 
> kerberos for hadoop security. While, after the initialzation of 
> SparkHadoopUtil , the authentiationMethod in UGI is updated to "simple"(my 
> xmls not on the classpath), which lead to the failure like below :
> {code:java}
> 933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
> SparkContext
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>   at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
>   at org.apache.spark.SparkContext.(SparkContext.scala:497)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   

[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2017-08-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139502#comment-16139502
 ] 

Saisai Shao commented on SPARK-17321:
-

1. if NM recovery is enabled, then yarn will provide a recovery path, this 
recovery path will be used for any aux-service running on yarn (tez, mr, 
spark...) and NM itself to store state. So user/yarn should guarantee the 
availability of this path, if not then NM itself will be failed to restart. So 
as a conclusion if NM recovery is enabled, then we should always use recovery 
path.

2. Yes we will never use NM local dirs whether NM recovery is enabled or not. 
Previously we need to support Hadoop 2.6- which has no recovery path, so we 
choose a local dir instead. Since now we only support 2.6+, so there's no 
meaning to still use NM local dir.

3. The memory overhead should not be large, since it only stores some 
application/executor information. Also when you use external shuffle service in 
standalone and Mesos, it always use memory, so I don't think it is a big 
problem.

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0, 2.1.1
>Reporter: yunjiong zhao
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Please vote for Apache Livy 0.4.0-incubating release

2017-08-23 Thread Saisai Shao
Hi teams,

Would you please help to vote for the Apache Livy release in general
incubator mail list. Thanks a lot.

http://mail-archives.apache.org/mod_mbox/incubator-general/201708.mbox/%3CCANvfmP8sAoofvRcs9AT5S-VW4LfiWeGfRXP2rhvEP4zyTag%2BYQ%40mail.gmail.com%3E

Thanks


Re: Livy with Spark package

2017-08-23 Thread Saisai Shao
You could set "spark.jars.packages" in `conf` field of session post API (
https://github.com/apache/incubator-livy/blob/master/docs/rest-api.md#post-sessions).
This is equal to --package in spark-submit.

BTW you'd better ask livy question in u...@livy.incubator.apache.org.

Thanks
Jerry

On Thu, Aug 24, 2017 at 8:11 AM, ayan guha  wrote:

> Hi
>
> I have a python program which I invoke as
>
>  spark-submit --packages com.databricks:spark-avro_2.11:3.2.0 somefile.py
>  "2017-08-23 02:00:00"  and it works
>
> Now I want to submit this file using Livy. I could work out most of the
> stuff (like putting files to HDFS etc) but not able to understand how/where
> to configure the "packages" switch...Any help?
> --
> Best Regards,
> Ayan Guha
>


[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2017-08-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138320#comment-16138320
 ] 

Saisai Shao commented on SPARK-17321:
-

We're facing the same issue. I think YARN shuffle service should be like:

* If NM recovery is not enabled, then Spark will not persist data into leveldb, 
in that case yarn shuffle service can still be served but lose the ability for 
recovery, (it is fine because the failure of NM will kill the containers as 
well as applications).
* If NM recovery is enabled, then user or yarn should guarantee recovery path 
is reliable. Because recovery path is also crucial for NM to recover.

What do you think [~tgraves] ? 

I'm currently working on the 1st thing to avoid persisting data into leveldb, 
to see if this is a feasible solution.

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0, 2.1.1
>Reporter: yunjiong zhao
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21819) UserGroupInformation initialization in SparkHadoopUtilwill overwrite user config

2017-08-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138308#comment-16138308
 ] 

Saisai Shao commented on SPARK-21819:
-

I'm not sure if Spark expose the user API to set {{Configuration}} to Spark on 
YARN.

One possible solution is to set hadoop configurations vis Spark conf like 
"spark.hadoop.", and SparkHadoopUtil will leverage them and set to Hadoop 
Configuration, this will be honored by yarn client.

>  UserGroupInformation initialization in SparkHadoopUtilwill overwrite user 
> config
> -
>
> Key: SPARK-21819
> URL: https://issues.apache.org/jira/browse/SPARK-21819
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, YARN
>Affects Versions: 2.1.0, 2.1.1
> Environment: Ubuntu14.04
> Spark2.10/2.11 (I checked the github of 2.20 , it exist there as well)
> Cluster mode: Yarn client 
>Reporter: Keith Sun
>
> When  submit job in Java or Scala code to ,the initialization of 
> SparkHadoopUtil will trigger the configuration overwritten in UGI which may 
> not be expected if the UGI has already been initialized by customized xmls 
> which are not on the classpath (like the cfg4j , which could set conf from 
> github code, a database etc). 
> {code:java}
> //it will overwrite the UGI conf which has already been initialized
> class SparkHadoopUtil extends Logging {
>   private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
>   val conf: Configuration = newConfiguration(sparkConf)
>   UserGroupInformation.setConfiguration(conf)
> {code}
> My scenario : My yarn cluster is kerberized, my configuration is set to use 
> kerberos for hadoop security. While, after the initialzation of 
> SparkHadoopUtil , the authentiationMethod in UGI is updated to "simple"(my 
> xmls not on the classpath), which lead to the failure like below :
> {code:java}
> 933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
> SparkContext
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>   at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
>   at org.apache.spark.SparkContext.(SparkContext.scala:497)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSessi

[jira] [Commented] (SPARK-21819) UserGroupInformation initialization in SparkHadoopUtilwill overwrite user config

2017-08-23 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138300#comment-16138300
 ] 

Saisai Shao commented on SPARK-21819:
-

I think here because `Configuration` object created in the user code cannot be 
leveraged by {{YARN#client}}, and {{YARN#client}} will create a `Configuration` 
object using default configurations, so {{YARN#client}} isn't aware of security 
stuffs and still issue RPC without Kerberos.

Looks like this is not an issue of Spark, the way you're writing the code and 
submitting application might not be as expected as normal Spark application.

>  UserGroupInformation initialization in SparkHadoopUtilwill overwrite user 
> config
> -
>
> Key: SPARK-21819
> URL: https://issues.apache.org/jira/browse/SPARK-21819
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, YARN
>Affects Versions: 2.1.0, 2.1.1
> Environment: Ubuntu14.04
> Spark2.10/2.11 (I checked the github of 2.20 , it exist there as well)
> Cluster mode: Yarn client 
>Reporter: Keith Sun
>
> When  submit job in Java or Scala code to ,the initialization of 
> SparkHadoopUtil will trigger the configuration overwritten in UGI which may 
> not be expected if the UGI has already been initialized by customized xmls 
> which are not on the classpath (like the cfg4j , which could set conf from 
> github code, a database etc). 
> {code:java}
> //it will overwrite the UGI conf which has already been initialized
> class SparkHadoopUtil extends Logging {
>   private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
>   val conf: Configuration = newConfiguration(sparkConf)
>   UserGroupInformation.setConfiguration(conf)
> {code}
> My scenario : My yarn cluster is kerberized, my configuration is set to use 
> kerberos for hadoop security. While, after the initialzation of 
> SparkHadoopUtil , the authentiationMethod in UGI is updated to "simple"(my 
> xmls not on the classpath), which lead to the failure like below :
> {code:java}
> 933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
> SparkContext
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>   at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>   at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
>   at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
>   at org.apache.spark.SparkContext.(SparkContext.scala:497)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
>   at 
> org.apache.spark.sql.SparkSe

[jira] [Commented] (SPARK-21660) Yarn ShuffleService failed to start when the chosen directory become read-only

2017-08-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137842#comment-16137842
 ] 

Saisai Shao commented on SPARK-21660:
-

Yes [~hyukjin.kwon] this is a dup JIRA.

> Yarn ShuffleService failed to start when the chosen directory become read-only
> --
>
> Key: SPARK-21660
> URL: https://issues.apache.org/jira/browse/SPARK-21660
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, YARN
>Affects Versions: 2.1.1
>Reporter: lishuming
>
> h3. Background
> In our production environment,disks corrupt to `read-only` status almost once 
> a month. Now the strategy of Yarn ShuffleService which chooses an available 
> directory(disk) to store Shuffle info(DB) is as 
> below(https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java#L340):
> 1. If NameNode's recoveryPath not empty and shuffle DB exists in the 
> recoveryPath, return the recoveryPath;
> 2. If recoveryPath empty and shuffle DB exists in 
> `yarn.nodemanager.local-dirs`, set recoveryPath as the existing DB path and 
> return the path;
> 3. If recoveryPath not empty(shuffle DB not exists in the path) and shuffle 
> DB exists in `yarn.nodemanager.local-dirs`, mv the existing shuffle DB to 
> recoveryPath and return the path;
> 4. If all above don't hit, we choose the first disk of 
> `yarn.nodemanager.local-dirs`as the recoveryPath;
> All above strategy don't consider the chosen disk(directory) is writable or 
> not, so in our environment we meet such exception:
> {code:java}
> 2017-06-25 07:15:43,512 ERROR org.apache.spark.network.util.LevelDBProvider: 
> error opening leveldb file /mnt/dfs/12/yarn/local/registeredExecutors.ldb. 
> Creating new file, will not be able to recover state for existing applications
> at 
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:48)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:116)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:94)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:66)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:167)
> 2017-06-25 07:15:43,514 WARN org.apache.spark.network.util.LevelDBProvider: 
> error deleting /mnt/dfs/12/yarn/local/registeredExecutors.ldb
> 2017-06-25 07:15:43,515 INFO org.apache.hadoop.service.AbstractService: 
> Service spark_shuffle failed in state INITED; cause: java.io.IOException: 
> Unable to create state store
> at 
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:77)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:116)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:94)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:66)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:167)
> at 
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:75)
> {code}
> h3. Consideration
> 1. For many production environment, `yarn.nodemanager.local-dirs` always has 
> more than 1 disk, so we can make a better chosen strategy to avoid the 
> problem above;
> 2. Can we add a strategy to check the DB directory we choose is writable, so 
> avoid the problem above?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-22 Thread Saisai Shao
Nan, I think Meisam already had a PR about this this, maybe you can discuss
with him on the github based on the proposed code.

Sorry I didn't follow the long discussion thread, but I think Paypal's
solution sounds simpler.

On Wed, Aug 23, 2017 at 12:07 AM, Nan Zhu  wrote:

> based on this result, I think we should follow the bulk operation pattern
>
> Shall we move forward with the PR from Paypal?
>
> Best,
>
> Nan
>
> On Mon, Aug 21, 2017 at 12:21 PM, Meisam Fathi 
> wrote:
>
> > Bottom line up front:
> > 1. The cost of calling 1 individual REST calls is about two order of
> > magnitude higher than calling a single batch REST call (1 * 0.05
> > seconds vs. 1.4 seconds)
> > 2. Time to complete a batch REST call plateaus at about 10,000
> application
> > reports per call.
> >
> > Full story:
> > I experimented and measure how long it takes to fetch Application Reports
> > from YARN with the REST API. My objective was to compare doing a batch
> REST
> > call to get all ApplicationReports vs doing individual REST calls for
> each
> > Application Report.
> >
> > I did the tests on 4 different cluster: 1) a test cluster, 2) a
> moderately
> > used dev cluster, 3) a lightly used production cluster, and 4) a heavily
> > used production cluster. For each cluster I made 7 REST call to get 1,
> 10,
> > 100, 1000, 1, 10, 100 application reports respectively. I
> > repeated each call 200 times to count for variations and I reported the
> > median time.
> > To measure the time, I used the following curl command:
> >
> > $ curl -o /dev/null -s -w "@curl-output-fromat.json" "http://
> > $rm_http_address:$rm_port/ws/v1/cluster/apps?applicationTypes=$
> > applicationTypes=$limit"
> >
> > The attached charts show the results. In all the charts, the x axis show
> > the number of results that were request in the call.
> > The bar chart show the time it takes to complete a REST call on each
> > cluster.
> > The first line plot also shows the same results as the bar chart on a log
> > scale (it is easier to see that the time to complete the REST call
> plateaus
> > at 10,000
> > The last chart shows the size of data that is being downloaded on each
> > REST call, which explains why the time plateaus  at 10,000.
> >
> >
> > [image: transfer_time_bar_plot.png][image: transfer_time_line_plot.png][
> image:
> > size_downloaded_line_plot.png]
> >
> >>
> >>
> > Thanks,
> > Meisam
> >
>


Re: [VOTE] Release Livy 0.4.0-incubating based on Livy 0.4.0 RC2

2017-08-22 Thread Saisai Shao
OK, sure, I will remove RC1 from the directory.

Thanks
Jerry

On Tue, Aug 22, 2017 at 7:24 PM, John D. Ament 
wrote:

> Hi,
>
> Looking at your release, it's confusing what we are voting on.  If RC2 is
> under vote, please remove RC1 from this directory.
>
> John
>
> On Tue, Aug 22, 2017 at 3:33 AM Jerry Shao  wrote:
>
> > Hello Incubator PMC’ers,
> >
> > The Apache Livy community has decided to release Apache Livy
> > 0.4.0-incubating based on 0.4.0-incubating Release Candidate 2. We now
> > kindly request the Incubator PMC members to review and vote on this
> > incubator
> > release.
> >
> > Livy is web service that exposes a REST interface for managing long
> running
> > Apache Spark contexts in your cluster. With Livy, new applications can be
> > built on top of Apache Spark that require fine grained interaction with
> > many Spark contexts.
> >
> > Artifacts are available at
> > https://dist.apache.org/repos/dist/dev/incubator/livy/, public keys are
> > available at https://dist.apache.org/repos/dist/dev/incubator/livy/KEYS.
> >
> > livy-0.4.0-incubating-src.zip <
> >
> > https://dist.apache.org/repos/dist/dev/incubator/livy/0.4.0-
> incubating/livy-0.4.0-incubating-src-RC2.zip
> > > is a source release. Along with it, for convenience, please find the
> > binary release as livy-0.4.0-incubating-bin-RC2.zip <
> >
> > https://dist.apache.org/repos/dist/dev/incubator/livy/0.4.0-
> incubating/livy-0.4.0-incubating-bin-RC2.zip
> > >.
> >
> >
> > Git tag:
> > *
> > https://github.com/apache/incubator-livy/releases/tag/
> v0.4.0-incubating-rc2
> > <
> > https://github.com/apache/incubator-livy/releases/tag/
> v0.4.0-incubating-rc2
> > >*
> >
> > The vote will be open for at least 72 hours or until necessary number of
> > votes are reached.
> >
> > Members please be sure to indicate "(Binding)" with your vote which will
> > help in tallying the vote(s).
> >
> > * Here is my +1 (non-binding) *
> >
> > Cheers,
> > Jerry
> >
>


Re: [VOTE] Release Livy 0.4.0-incubating based on Livy 0.4.0 RC2

2017-08-22 Thread Saisai Shao
OK, sure, I will remove RC1 from the directory.

Thanks
Jerry

On Tue, Aug 22, 2017 at 7:24 PM, John D. Ament 
wrote:

> Hi,
>
> Looking at your release, it's confusing what we are voting on.  If RC2 is
> under vote, please remove RC1 from this directory.
>
> John
>
> On Tue, Aug 22, 2017 at 3:33 AM Jerry Shao  wrote:
>
> > Hello Incubator PMC’ers,
> >
> > The Apache Livy community has decided to release Apache Livy
> > 0.4.0-incubating based on 0.4.0-incubating Release Candidate 2. We now
> > kindly request the Incubator PMC members to review and vote on this
> > incubator
> > release.
> >
> > Livy is web service that exposes a REST interface for managing long
> running
> > Apache Spark contexts in your cluster. With Livy, new applications can be
> > built on top of Apache Spark that require fine grained interaction with
> > many Spark contexts.
> >
> > Artifacts are available at
> > https://dist.apache.org/repos/dist/dev/incubator/livy/, public keys are
> > available at https://dist.apache.org/repos/dist/dev/incubator/livy/KEYS.
> >
> > livy-0.4.0-incubating-src.zip <
> >
> > https://dist.apache.org/repos/dist/dev/incubator/livy/0.4.0-
> incubating/livy-0.4.0-incubating-src-RC2.zip
> > > is a source release. Along with it, for convenience, please find the
> > binary release as livy-0.4.0-incubating-bin-RC2.zip <
> >
> > https://dist.apache.org/repos/dist/dev/incubator/livy/0.4.0-
> incubating/livy-0.4.0-incubating-bin-RC2.zip
> > >.
> >
> >
> > Git tag:
> > *
> > https://github.com/apache/incubator-livy/releases/tag/
> v0.4.0-incubating-rc2
> > <
> > https://github.com/apache/incubator-livy/releases/tag/
> v0.4.0-incubating-rc2
> > >*
> >
> > The vote will be open for at least 72 hours or until necessary number of
> > votes are reached.
> >
> > Members please be sure to indicate "(Binding)" with your vote which will
> > help in tallying the vote(s).
> >
> > * Here is my +1 (non-binding) *
> >
> > Cheers,
> > Jerry
> >
>


[jira] [Commented] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136454#comment-16136454
 ] 

Saisai Shao commented on SPARK-21733:
-

[~1028344...@qq.com], I'm really not sure what are you trying to comment on the 
JIRA, you keep posting the logs without describing any problem.

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-21 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-21733.
-
Resolution: Not A Problem

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136334#comment-16136334
 ] 

Saisai Shao commented on SPARK-21733:
-

I'm going to close this issue again because the behavior is expected, and if 
you have any doubt or question about the mechanism here, I think you should ask 
in the mail list.

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136326#comment-16136326
 ] 

Saisai Shao edited comment on SPARK-21733 at 8/22/17 5:41 AM:
--

This is because executor is killed by NM with SIGTERM, this is quite normal for 
Spark on YARN application, I don't think there's an issue beside this error 
log. Everything will be cleaned out with shutdown hook even SIGTERM received, 
so it should be fine like clean stop.

If you really confused here, you should figure out why executor is killed by 
YARN NM, is it a normal kill because of EOL or it is killed because of other 
reasons (like OOM)?


was (Author: jerryshao):
This is because executor is killed by NM with SIGTERM, this is quite normal for 
Spark on YARN application, I don't think there's an issue beside this error 
log. Everything will be cleaned out with shutdown hook even SIGTERM received, 
so it should be fine like clean stop.

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136326#comment-16136326
 ] 

Saisai Shao commented on SPARK-21733:
-

This is because executor is killed by NM with SIGTERM, this is quite normal for 
Spark on YARN application, I don't think there's an issue beside this error 
log. Everything will be cleaned out with shutdown hook even SIGTERM received, 
so it should be fine like clean stop.

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136196#comment-16136196
 ] 

Saisai Shao commented on SPARK-21798:
-

I think this one could be used, looks like there's no other options in the 
current code.

BTW, why do you need to expand Spark history server's classpath, do you have 
some customized history provider?

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Spark Web UI SSL Encryption

2017-08-21 Thread Saisai Shao
Can you please post the specific problem you met?

Thanks
Jerry

On Sat, Aug 19, 2017 at 1:49 AM, Anshuman Kumar 
wrote:

> Hello,
>
> I have recently installed Sparks 2.2.0, and trying to use it for some big
> data processing. Spark is installed on a server that I access from a remote
> computer. I need to setup SSL encryption for the Spark web UI, but
> following some threads online I’m still not able to set it up.
>
> Can someone help me with the SSL encryption.
>
> Warm Regards.
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[jira] [Commented] (SPARK-21752) Config spark.jars.packages is ignored in SparkSession config

2017-08-17 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130078#comment-16130078
 ] 

Saisai Shao commented on SPARK-21752:
-

For configurations like "spark.jars.packages" should be configured either in 
spark-defaults, or pass by command-line arguments, setting it in the runtime 
cannot be worked, the reason is already mentioned by [~srowen].

For PySpark SparkConf, it might be happened to hit the specific logics of 
PySpark. But I don't think it is the intention of Spark/PySpark.

So I don't think there's an issue here, you should always configure such 
configuration before application is launched.

BTW, it also applies to {{master}}, it happens to work correctly in local mode, 
but if you change to yarn client or yarn cluster, I don't think it will be 
worked correctly by setting it in the run-time.

> Config spark.jars.packages is ignored in SparkSession config
> 
>
> Key: SPARK-21752
> URL: https://issues.apache.org/jira/browse/SPARK-21752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jakub Nowacki
>
> If I put a config key {{spark.jars.packages}} using {{SparkSession}} builder 
> as follows:
> {code}
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")\
> .config("spark.mongodb.input.uri", "mongodb://mongo/test.coll") \
> .config("spark.mongodb.output.uri", "mongodb://mongo/test.coll") \
> .getOrCreate()
> {code}
> the SparkSession gets created but there are no package download logs printed, 
> and if I use the loaded classes, Mongo connector in this case, but it's the 
> same for other packages, I get {{java.lang.ClassNotFoundException}} for the 
> missing classes.
> If I use the config file {{conf/spark-defaults.comf}}, command line option 
> {{--packages}}, e.g.:
> {code}
> import os
> os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages 
> org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 pyspark-shell'
> {code}
> it works fine. Interestingly, using {{SparkConf}} object works fine as well, 
> e.g.:
> {code}
> conf = pyspark.SparkConf()
> conf.set("spark.jars.packages", 
> "org.mongodb.spark:mongo-spark-connector_2.11:2.2.0")
> conf.set("spark.mongodb.input.uri", "mongodb://mongo/test.coll")
> conf.set("spark.mongodb.output.uri", "mongodb://mongo/test.coll")
> spark = pyspark.sql.SparkSession.builder\
> .appName('test-mongo')\
> .master('local[*]')\
> .config(conf=conf)\
> .getOrCreate()
> {code}
> The above is in Python but I've seen the behavior in other languages, though, 
> I didn't check R. 
> I also have seen it in older Spark versions.
> It seems that this is the only config key that doesn't work for me via the 
> {{SparkSession}} builder config.
> Note that this is related to creating new {{SparkSession}} as getting new 
> packages into existing {{SparkSession}} doesn't indeed make sense. Thus this 
> will only work with bare Python, Scala or Java, and not on {{pyspark}} or 
> {{spark-shell}} as they create the session automatically; it this case one 
> would need to use {{--packages}} option. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Help to verify Apache Livy 0.4.0-incubating release

2017-08-17 Thread Saisai Shao
Hi all,

We're under progress to make a first Apache release of Livy
(0.4.0-incubating), we really hope you could verify the RC2[1] release
(binary and source) locally and return us the feedbacks.

We will call for an incubation vote next week if everything is fine.

Thanks a lot for your help.

[1]https://dist.apache.org/repos/dist/dev/incubator/livy/0.4.0-incubating/

Best regards,
Saisai (Jerry) Shao


[jira] [Commented] (SPARK-21714) SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again

2017-08-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127199#comment-16127199
 ] 

Saisai Shao commented on SPARK-21714:
-

Let me take a crack on this if no one is working on it.

> SparkSubmit in Yarn Client mode downloads remote files and then reuploads 
> them again
> 
>
> Key: SPARK-21714
> URL: https://issues.apache.org/jira/browse/SPARK-21714
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Thomas Graves
>Priority: Critical
>
> SPARK-10643 added the ability for spark-submit to download remote file in 
> client mode.
> However in yarn mode this introduced a bug where it downloads them for the 
> client but then yarn client just reuploads them to HDFS and uses them again. 
> This should not happen when the remote file is HDFS.  This is wasting 
> resources and its defeating the  distributed cache because if the original 
> object was public it would have been shared by many users. By us downloading 
> and reuploading, it becomes private.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126974#comment-16126974
 ] 

Saisai Shao commented on SPARK-21733:
-

Should the resolution of this JIRA be "invalid" or else? Actually nothing is 
fixed here.

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21733) ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

2017-08-15 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126894#comment-16126894
 ] 

Saisai Shao commented on SPARK-21733:
-

Is there any issue here?

> ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> -
>
> Key: SPARK-21733
> URL: https://issues.apache.org/jira/browse/SPARK-21733
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
> Environment: Apache Spark2.1.1 
> CDH5.12.0 Yarn
>Reporter: Jepson
>  Labels: patch
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Kafka+Spark streaming ,throw these error:
> {code:java}
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:14 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8003 took 11 ms
> 17/08/15 09:34:14 INFO memory.MemoryStore: Block broadcast_8003 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:14 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:14 INFO executor.Executor: Finished task 7.0 in stage 8003.0 
> (TID 64178). 1740 bytes result sent to driver
> 17/08/15 09:34:21 INFO storage.BlockManager: Removing RDD 8002
> 17/08/15 09:34:21 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 64186
> 17/08/15 09:34:21 INFO executor.Executor: Running task 7.0 in stage 8004.0 
> (TID 64186)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Started reading broadcast 
> variable 8004
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004_piece0 stored 
> as bytes in memory (estimated size 1895.0 B, free 1643.2 MB)
> 17/08/15 09:34:21 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 8004 took 8 ms
> 17/08/15 09:34:21 INFO memory.MemoryStore: Block broadcast_8004 stored as 
> values in memory (estimated size 2.9 KB, free 1643.2 MB)
> 17/08/15 09:34:21 INFO kafka010.KafkaRDD: Beginning offset 10130733 is the 
> same as ending offset skipping kssh 5
> 17/08/15 09:34:21 INFO executor.Executor: Finished task 7.0 in stage 8004.0 
> (TID 64186). 1740 bytes result sent to driver
> h3. 17/08/15 09:34:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED 
> SIGNAL TERM
> 17/08/15 09:34:29 INFO storage.DiskBlockManager: Shutdown hook called
> 17/08/15 09:34:29 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9104) expose network layer memory usage in shuffle part

2017-08-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125331#comment-16125331
 ] 

Saisai Shao commented on SPARK-9104:


I think it is quite useful to get the memory details of Netty, so I'm trying to 
get another shot on this issue (https://github.com/apache/spark/pull/18935).

> expose network layer memory usage in shuffle part
> -
>
> Key: SPARK-9104
> URL: https://issues.apache.org/jira/browse/SPARK-9104
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Zhang, Liye
>
> The default network transportation is netty, and when transfering blocks for 
> shuffle, the network layer will consume a decent size of memory, we shall 
> collect the memory usage of this part and expose it. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21714) SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again

2017-08-13 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125115#comment-16125115
 ] 

Saisai Shao commented on SPARK-21714:
-

I noticed this issue before and tried to fix it, but the solution made the 
SparkSubmit code a little complicated.

> SparkSubmit in Yarn Client mode downloads remote files and then reuploads 
> them again
> 
>
> Key: SPARK-21714
> URL: https://issues.apache.org/jira/browse/SPARK-21714
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.2.0
>Reporter: Thomas Graves
>Priority: Critical
>
> SPARK-10643 added the ability for spark-submit to download remote file in 
> client mode.
> However in yarn mode this introduced a bug where it downloads them for the 
> client but then yarn client just reuploads them to HDFS and uses them again. 
> This should not happen when the remote file is HDFS.  This is wasting 
> resources and its defeating the  distributed cache because if the original 
> object was public it would have been shared by many users. By us downloading 
> and reuploading, it becomes private.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Spark 2.1.x client with 2.2.0 cluster

2017-08-10 Thread Saisai Shao
As I remembered using Spark 2.1 Driver to communicate with Spark 2.2
executors will throw some RPC exceptions (I don't remember the details of
exception).

On Thu, Aug 10, 2017 at 4:23 PM, Ted Yu  wrote:

> Hi,
> Has anyone used Spark 2.1.x client with Spark 2.2.0 cluster ?
>
> If so, is there any compatibility issue observed ?
>
> Thanks
>


[jira] [Commented] (SPARK-21660) Yarn ShuffleService failed to start when the chosen directory become read-only

2017-08-10 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121125#comment-16121125
 ] 

Saisai Shao commented on SPARK-21660:
-

Will yarn NM handle this bad disk problem and return a good disk for 
recoveryPath? I guess yarn should handle this problem.

> Yarn ShuffleService failed to start when the chosen directory become read-only
> --
>
> Key: SPARK-21660
> URL: https://issues.apache.org/jira/browse/SPARK-21660
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, YARN
>Affects Versions: 2.1.1
>Reporter: lishuming
>
> h3. Background
> In our production environment,disks corrupt to `read-only` status almost once 
> a month. Now the strategy of Yarn ShuffleService which chooses an available 
> directory(disk) to store Shuffle info(DB) is as 
> below(https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java#L340):
> 1. If NameNode's recoveryPath not empty and shuffle DB exists in the 
> recoveryPath, return the recoveryPath;
> 2. If recoveryPath empty and shuffle DB exists in 
> `yarn.nodemanager.local-dirs`, set recoveryPath as the existing DB path and 
> return the path;
> 3. If recoveryPath not empty(shuffle DB not exists in the path) and shuffle 
> DB exists in `yarn.nodemanager.local-dirs`, mv the existing shuffle DB to 
> recoveryPath and return the path;
> 4. If all above don't hit, we choose the first disk of 
> `yarn.nodemanager.local-dirs`as the recoveryPath;
> All above strategy don't consider the chosen disk(directory) is writable or 
> not, so in our environment we meet such exception:
> {code:java}
> 2017-06-25 07:15:43,512 ERROR org.apache.spark.network.util.LevelDBProvider: 
> error opening leveldb file /mnt/dfs/12/yarn/local/registeredExecutors.ldb. 
> Creating new file, will not be able to recover state for existing applications
> at 
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:48)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:116)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:94)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:66)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:167)
> 2017-06-25 07:15:43,514 WARN org.apache.spark.network.util.LevelDBProvider: 
> error deleting /mnt/dfs/12/yarn/local/registeredExecutors.ldb
> 2017-06-25 07:15:43,515 INFO org.apache.hadoop.service.AbstractService: 
> Service spark_shuffle failed in state INITED; cause: java.io.IOException: 
> Unable to create state store
> at 
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:77)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:116)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:94)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:66)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:167)
> at 
> org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:75)
> {code}
> h3. Consideration
> 1. For many production environment, `yarn.nodemanager.local-dirs` always has 
> more than 1 disk, so we can make a better chosen strategy to avoid the 
> problem above;
> 2. Can we add a strategy to check the DB directory we choose is writable, so 
> avoid the problem above?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: To release a first Apache version Livy

2017-08-07 Thread Saisai Shao
Thanks Marcelo for your comments. I checked create_release.sh in Spark, I
think I will adopt this after this first release, for this first release I
would like to do each step in person.

On Tue, Aug 8, 2017 at 8:34 AM, Marcelo Vanzin <van...@cloudera.com> wrote:

> Spark has a "create_release.sh" script, I wonder if that can be reused
> / adapted for Livy to make this easier in the future.
>
> I tracked all the dependencies' licenses for the incubation proposal,
> if that helps; although I didn't have the actual text of the licenses
> there.
>
> On Mon, Aug 7, 2017 at 5:29 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
> > Just took a quick look at it, and here are some comments:
> >
> > Apache release requires source releases as described below
> > http://www.apache.org/legal/release-policy.html#source-packages
> >
> > Binary releases can also be provided as a convenience for users:
> > http://www.apache.org/legal/release-policy.html#compiled-packages
> >
> >
> > Also, for the binary artifact, I would list each included jar and it's
> > license type in the license file (see below as an example)
> > http://svn.apache.org/repos/asf/tuscany/sca-java-2.x/
> trunk/distribution/all/src/main/release/bin/LICENSE
> >
> > But note that, the source release artifact, which should not include any
> > dependency jars, should have the LICENSE file similar to what is in the
> > root of git today.
> >
> >
> >
> >
> >
> > On Mon, Aug 7, 2017 at 2:16 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
> >
> >> Hi Team,
> >>
> >> Today I cut a RC1 release based on branch-0.4, here is the link (
> >> https://github.com/apache/incubator-livy/releases/tag/
> >> v0.4.0-incubating-rc1
> >> and https://dist.apache.org/repos/dist/dev/incubator/livy/), would you
> >> please help to test and verify. Thanks a lot and appreciate your help.
> >>
> >> Best regards,
> >> Saisai
> >>
> >> On Sat, Aug 5, 2017 at 10:44 AM, Alex Bozarth <ajboz...@us.ibm.com>
> wrote:
> >>
> >> > Last comment on the INFRA JIRA seemed to indicate that they hit a snag
> >> > with the import over a week ago and he never got back to us after. He
> >> told
> >> > us to keep using the Cloudera JIRA until he successfully completed a
> test
> >> > import then we could re-export for him.
> >> >
> >> >
> >> > *Alex Bozarth*
> >> > Software Engineer
> >> > Spark Technology Center
> >> > --
> >> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> >> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
> >> >
> >> >
> >> > 505 Howard Street
> >> > San Francisco, CA 94105
> >> > United States
> >> >
> >> >
> >> >
> >> > [image: Inactive hide details for Bikas Saha ---08/04/2017 07:22:50
> >> > PM---Hi, Most of those jiras looked like bug fixes to me. Hence I
> t]Bikas
> >> > Saha ---08/04/2017 07:22:50 PM---Hi, Most of those jiras looked like
> bug
> >> > fixes to me. Hence I thought 0.4 could be a bug fix release.
> >> >
> >> > From: Bikas Saha <bi...@apache.org>
> >> > To: "dev@livy.incubator.apache.org" <dev@livy.incubator.apache.org>
> >> > Date: 08/04/2017 07:22 PM
> >> > Subject: Re: To release a first Apache version Livy
> >> > --
> >> >
> >> >
> >> >
> >> > Hi,
> >> >
> >> >
> >> > Most of those jiras looked like bug fixes to me. Hence I thought 0.4
> >> could
> >> > be a bug fix release. But I am ok releasing the current state so users
> >> can
> >> > gets an Apache release to transition to.
> >> >
> >> >
> >> > Given that its still a new project, a shorter cadence would help make
> bug
> >> > fix releases available.
> >> >
> >> >
> >> > Btw, does anyone know whats holding up the Apache jira process? If
> not, I
> >> > can follow up on that.
> >> >
> >> >
> >> > Bikas
> >> >
> >> > 
> >> > From: Saisai Shao <sai.sai.s...@gmail.com>
> >> > Sent: Thursday, August 3, 2017 7:31:10 PM
> >> > To: dev@livy.inc

Re: To release a first Apache version Livy

2017-08-07 Thread Saisai Shao
Thanks Luciano for your comments, I will do it today.

On Tue, Aug 8, 2017 at 8:29 AM, Luciano Resende <luckbr1...@gmail.com>
wrote:

> Just took a quick look at it, and here are some comments:
>
> Apache release requires source releases as described below
> http://www.apache.org/legal/release-policy.html#source-packages
>
> Binary releases can also be provided as a convenience for users:
> http://www.apache.org/legal/release-policy.html#compiled-packages
>
>
> Also, for the binary artifact, I would list each included jar and it's
> license type in the license file (see below as an example)
> http://svn.apache.org/repos/asf/tuscany/sca-java-2.x/
> trunk/distribution/all/src/main/release/bin/LICENSE
>
> But note that, the source release artifact, which should not include any
> dependency jars, should have the LICENSE file similar to what is in the
> root of git today.
>
>
>
>
>
> On Mon, Aug 7, 2017 at 2:16 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > Today I cut a RC1 release based on branch-0.4, here is the link (
> > https://github.com/apache/incubator-livy/releases/tag/
> > v0.4.0-incubating-rc1
> > and https://dist.apache.org/repos/dist/dev/incubator/livy/), would you
> > please help to test and verify. Thanks a lot and appreciate your help.
> >
> > Best regards,
> > Saisai
> >
> > On Sat, Aug 5, 2017 at 10:44 AM, Alex Bozarth <ajboz...@us.ibm.com>
> wrote:
> >
> > > Last comment on the INFRA JIRA seemed to indicate that they hit a snag
> > > with the import over a week ago and he never got back to us after. He
> > told
> > > us to keep using the Cloudera JIRA until he successfully completed a
> test
> > > import then we could re-export for him.
> > >
> > >
> > > *Alex Bozarth*
> > > Software Engineer
> > > Spark Technology Center
> > > --
> > > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
> > >
> > >
> > > 505 Howard Street
> > > San Francisco, CA 94105
> > > United States
> > >
> > >
> > >
> > > [image: Inactive hide details for Bikas Saha ---08/04/2017 07:22:50
> > > PM---Hi, Most of those jiras looked like bug fixes to me. Hence I
> t]Bikas
> > > Saha ---08/04/2017 07:22:50 PM---Hi, Most of those jiras looked like
> bug
> > > fixes to me. Hence I thought 0.4 could be a bug fix release.
> > >
> > > From: Bikas Saha <bi...@apache.org>
> > > To: "dev@livy.incubator.apache.org" <dev@livy.incubator.apache.org>
> > > Date: 08/04/2017 07:22 PM
> > > Subject: Re: To release a first Apache version Livy
> > > --
> > >
> > >
> > >
> > > Hi,
> > >
> > >
> > > Most of those jiras looked like bug fixes to me. Hence I thought 0.4
> > could
> > > be a bug fix release. But I am ok releasing the current state so users
> > can
> > > gets an Apache release to transition to.
> > >
> > >
> > > Given that its still a new project, a shorter cadence would help make
> bug
> > > fix releases available.
> > >
> > >
> > > Btw, does anyone know whats holding up the Apache jira process? If
> not, I
> > > can follow up on that.
> > >
> > >
> > > Bikas
> > >
> > > 
> > > From: Saisai Shao <sai.sai.s...@gmail.com>
> > > Sent: Thursday, August 3, 2017 7:31:10 PM
> > > To: dev@livy.incubator.apache.org
> > > Subject: Re: To release a first Apache version Livy
> > >
> > > From my side 0.4 might be a feasible choice to release as for Apache.
> > > Reverting all the features and release 0.3.1 is too time-consuming and
> > not
> > > so necessary.
> > >
> > > On Fri, Aug 4, 2017 at 4:16 AM, Alex Bozarth <ajboz...@us.ibm.com>
> > wrote:
> > >
> > > > @Bikas The list of JIRAs fixed in 0.4 is 50 long (
> > > > https://issues.cloudera.org/issues/?jql=project%20%3D%
> > > > 20LIVY%20AND%20fixVersion%20%3D%200.4) so I'm wondering what you
> mean
> > by
> > > > not including feature work. Are you suggesting we revert some of the
> > work
> > > > for this release and the re-merge it later, or just that you would've
> > > > preferred i

Re: To release a first Apache version Livy

2017-08-07 Thread Saisai Shao
Hi Team,

Today I cut a RC1 release based on branch-0.4, here is the link (
https://github.com/apache/incubator-livy/releases/tag/v0.4.0-incubating-rc1
and https://dist.apache.org/repos/dist/dev/incubator/livy/), would you
please help to test and verify. Thanks a lot and appreciate your help.

Best regards,
Saisai

On Sat, Aug 5, 2017 at 10:44 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> Last comment on the INFRA JIRA seemed to indicate that they hit a snag
> with the import over a week ago and he never got back to us after. He told
> us to keep using the Cloudera JIRA until he successfully completed a test
> import then we could re-export for him.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Bikas Saha ---08/04/2017 07:22:50
> PM---Hi, Most of those jiras looked like bug fixes to me. Hence I t]Bikas
> Saha ---08/04/2017 07:22:50 PM---Hi, Most of those jiras looked like bug
> fixes to me. Hence I thought 0.4 could be a bug fix release.
>
> From: Bikas Saha <bi...@apache.org>
> To: "dev@livy.incubator.apache.org" <dev@livy.incubator.apache.org>
> Date: 08/04/2017 07:22 PM
> Subject: Re: To release a first Apache version Livy
> --
>
>
>
> Hi,
>
>
> Most of those jiras looked like bug fixes to me. Hence I thought 0.4 could
> be a bug fix release. But I am ok releasing the current state so users can
> gets an Apache release to transition to.
>
>
> Given that its still a new project, a shorter cadence would help make bug
> fix releases available.
>
>
> Btw, does anyone know whats holding up the Apache jira process? If not, I
> can follow up on that.
>
>
> Bikas
>
> 
> From: Saisai Shao <sai.sai.s...@gmail.com>
> Sent: Thursday, August 3, 2017 7:31:10 PM
> To: dev@livy.incubator.apache.org
> Subject: Re: To release a first Apache version Livy
>
> From my side 0.4 might be a feasible choice to release as for Apache.
> Reverting all the features and release 0.3.1 is too time-consuming and not
> so necessary.
>
> On Fri, Aug 4, 2017 at 4:16 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:
>
> > @Bikas The list of JIRAs fixed in 0.4 is 50 long (
> > https://issues.cloudera.org/issues/?jql=project%20%3D%
> > 20LIVY%20AND%20fixVersion%20%3D%200.4) so I'm wondering what you mean by
> > not including feature work. Are you suggesting we revert some of the work
> > for this release and the re-merge it later, or just that you would've
> > preferred it that was released without new features, but are okay with
> how
> > it is anyway? If you're worried about adding features in the first Apache
> > release should we also look at re-releasing 0.3.0 as 0.3.1-incubating or
> > back support, personally I think it's not worth it.
> >
> > As for release cadence, so far we've seemed to shoot for a 6 month
> cadence
> > (was longer this time with the move to Apache) but I'd be fine moving to
> a
> > 4 month cadence. I also prefer a time based release.
> >
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > [image: Inactive hide details for Bikas Saha ---08/03/2017 12:56:59
> > PM---Hi, My preference would have been to not do a feature bearing]Bikas
> > Saha ---08/03/2017 12:56:59 PM---Hi, My preference would have been to not
> > do a feature bearing first release so that users could safe
> >
> > From: Bikas Saha <bi...@apache.org>
> > To: "dev@livy.incubator.apache.org" <dev@livy.incubator.apache.org>
> > Date: 08/03/2017 12:56 PM
> > Subject: Re: To release a first Apache version Livy
> > --
> >
> >
> >
> > Hi,
> >
> >
> > My preference would have been to not do a feature bearing first release
> so
> > that users could safely and painlessly migrate to the Apache release.
> > Adding features increases the risk of regressions etc.
> >
> >
> > However it seems like the web UI would be a relatively independent
> f

[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri

2017-08-04 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114291#comment-16114291
 ] 

Saisai Shao commented on SPARK-21618:
-

Hi Steve, I'm not quite following your comments. You mean that in Hadoop 2.9+ 
there's built-in scheme support for http(s), am I right?

> http(s) not accepted in spark-submit jar uri
> 
>
> Key: SPARK-21618
> URL: https://issues.apache.org/jira/browse/SPARK-21618
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1, 2.2.0
> Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 
> 16.04. 
>Reporter: Ben Mayne
>Priority: Minor
>  Labels: documentation
>
> The documentation suggests I should be able to use an http(s) uri for a jar 
> in spark-submit, but I haven't been successful 
> https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
> {noformat}
> benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master 
> local[2] --class class.name.Test https://test.com/path/to/jar.jar
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Exception in thread "main" java.io.IOException: No FileSystem for scheme: 
> https
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>   at 
> org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> benmayne@Benjamins-MacBook-Pro ~ $
> {noformat}
> If I replace the path with a valid hdfs path 
> (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the 
> same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 
> on ubuntu. 
> this is the example that I'm trying to replicate from 
> https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:
>  
> > Spark uses the following URL scheme to allow different strategies for 
> > disseminating jars:
> > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file 
> > server, and every executor pulls the file from the driver HTTP server.
> > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as 
> > expected
> {noformat}
> # Run on a Mesos cluster in cluster deploy mode with supervise
> ./bin/spark-submit \
>   --class org.apache.spark.examples.SparkPi \
>   --master mesos://207.184.161.138:7077 \
>   --deploy-mode cluster \
>   --supervise \
>   --executor-memory 20G \
>   --total-executor-cores 100 \
>   http://path/to/examples.jar \
>   1000
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Question about folder in https://dist.apache.org/repos/dist/dev/incubator/

2017-08-04 Thread Saisai Shao
Hi John,

Sorry I should bring up in the dev list first, will do it next time when
meet similar issues. Thanks a lot for your help.

Best regards,
Saisai

On Fri, Aug 4, 2017 at 6:40 PM, John D. Ament <johndam...@apache.org> wrote:

> Hi Saisai,
>
> Just wondering, did you bring this up on your dev list?  Mentors are
> responsible for creating the dist area.
>
> I went ahead and created one for you, in both /dist/dev and /dist/release.
>
> John
>
> On Fri, Aug 4, 2017 at 5:05 AM Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> > Hi Team,
> >
> > We're working on the first Apache release of incubator-livy project, as
> the
> > document mentioned that source release should be staged under this
> folder,
> > but I don't find a folder named "livy", how do I create this folder,
> please
> > suggest, thanks!
> >
> > The Incubator PMC expects the source releases to be staged on
> > > https://dist.apache.org/repos/dist/dev/incubator/$podlingName so that
> > > they can easily be moved to the release location via svn mv.
> >
> >
> >
> > Best regards,
> > Saisai
> >
>


Question about folder in https://dist.apache.org/repos/dist/dev/incubator/

2017-08-04 Thread Saisai Shao
Hi Team,

We're working on the first Apache release of incubator-livy project, as the
document mentioned that source release should be staged under this folder,
but I don't find a folder named "livy", how do I create this folder, please
suggest, thanks!

The Incubator PMC expects the source releases to be staged on
> https://dist.apache.org/repos/dist/dev/incubator/$podlingName so that
> they can easily be moved to the release location via svn mv.



Best regards,
Saisai


Re: To release a first Apache version Livy

2017-08-03 Thread Saisai Shao
>From my side 0.4 might be a feasible choice to release as for Apache.
Reverting all the features and release 0.3.1 is too time-consuming and not
so necessary.

On Fri, Aug 4, 2017 at 4:16 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> @Bikas The list of JIRAs fixed in 0.4 is 50 long (
> https://issues.cloudera.org/issues/?jql=project%20%3D%
> 20LIVY%20AND%20fixVersion%20%3D%200.4) so I'm wondering what you mean by
> not including feature work. Are you suggesting we revert some of the work
> for this release and the re-merge it later, or just that you would've
> preferred it that was released without new features, but are okay with how
> it is anyway? If you're worried about adding features in the first Apache
> release should we also look at re-releasing 0.3.0 as 0.3.1-incubating or
> back support, personally I think it's not worth it.
>
> As for release cadence, so far we've seemed to shoot for a 6 month cadence
> (was longer this time with the move to Apache) but I'd be fine moving to a
> 4 month cadence. I also prefer a time based release.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Bikas Saha ---08/03/2017 12:56:59
> PM---Hi, My preference would have been to not do a feature bearing]Bikas
> Saha ---08/03/2017 12:56:59 PM---Hi, My preference would have been to not
> do a feature bearing first release so that users could safe
>
> From: Bikas Saha <bi...@apache.org>
> To: "dev@livy.incubator.apache.org" <dev@livy.incubator.apache.org>
> Date: 08/03/2017 12:56 PM
> Subject: Re: To release a first Apache version Livy
> --
>
>
>
> Hi,
>
>
> My preference would have been to not do a feature bearing first release so
> that users could safely and painlessly migrate to the Apache release.
> Adding features increases the risk of regressions etc.
>
>
> However it seems like the web UI would be a relatively independent feature
> that would not affect the core stability. So it may be fine to include that
> in the first release as a new feature. In some ways, it gives users an
> incentive to move to the Apache release.
>
>
> +1 for getting the first release out as soon as feasible. And doing core
> feature work in follow up release.
>
>
> On this note, it would be good to consider the question of release
> cadence. Should we move to a 3 month or 4 month release cadence such that
> release trains are available for features to out when the features are
> ready. Or should do feature releases such that releases come out when major
> new functionality has added. My preference is a time based release cadence
> because it provides regular bug and security related fixes available in a
> released form for end users.
>
>
> Thanks!
>
> Bikas
>
>
> 
> From: Saisai Shao <sai.sai.s...@gmail.com>
> Sent: Monday, July 31, 2017 8:34:03 PM
> To: dev@livy.incubator.apache.org
> Subject: Re: To release a first Apache version Livy
>
> OK, thanks! Let's get this merged then prepare to the first Apache release.
>
> On Tue, Aug 1, 2017 at 11:30 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:
>
> > There's two open PRs on the livy repo with a follow-up PR on the website
> > repo:
> >
> > https://github.com/apache/incubator-livy/pull/25 <- Web UI Log Page
> > https://github.com/apache/incubator-livy/pull/26 <- Ability to build
> Livy
> > Docs
> > https://github.com/apache/incubator-livy-website/pull/7 <- Add Livy Docs
> > to Website (it may actually be better to update this and merge it after
> the
> > release)
> >
> > Once the Log Page PR is merged the basic Web UI is complete, the
> remaining
> > UI JIRAs are feature adds and tests that can come in the next release.
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > [image: Inactive hide details for Saisai Shao ---07/31/2017 07:12:11
> > PM---Hi Alex, can you please list the JIRAs for UI related works y]Saisai
> > Shao ---07/31/2017 07:12:11 PM---Hi Alex, can you pleas

[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri

2017-08-03 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113806#comment-16113806
 ] 

Saisai Shao commented on SPARK-21618:
-

[~benmayne] If you try the master branch of Spark with SPARK-21012 in, the jars 
could be downloaded from http(s) url, please take a try.

> http(s) not accepted in spark-submit jar uri
> 
>
> Key: SPARK-21618
> URL: https://issues.apache.org/jira/browse/SPARK-21618
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1, 2.2.0
> Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu 
> 16.04. 
>Reporter: Ben Mayne
>Priority: Minor
>  Labels: documentation
>
> The documentation suggests I should be able to use an http(s) uri for a jar 
> in spark-submit, but I haven't been successful 
> https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
> {noformat}
> benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master 
> local[2] --class class.name.Test https://test.com/path/to/jar.jar
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Exception in thread "main" java.io.IOException: No FileSystem for scheme: 
> https
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>   at 
> org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> benmayne@Benjamins-MacBook-Pro ~ $
> {noformat}
> If I replace the path with a valid hdfs path 
> (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the 
> same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 
> on ubuntu. 
> this is the example that I'm trying to replicate from 
> https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management:
>  
> > Spark uses the following URL scheme to allow different strategies for 
> > disseminating jars:
> > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file 
> > server, and every executor pulls the file from the driver HTTP server.
> > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as 
> > expected
> {noformat}
> # Run on a Mesos cluster in cluster deploy mode with supervise
> ./bin/spark-submit \
>   --class org.apache.spark.examples.SparkPi \
>   --master mesos://207.184.161.138:7077 \
>   --deploy-mode cluster \
>   --supervise \
>   --executor-memory 20G \
>   --total-executor-cores 100 \
>   http://path/to/examples.jar \
>   1000
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: To release a first Apache version Livy

2017-08-02 Thread Saisai Shao
Hi Team,

We're planning to release a first Apache version of Livy. Would you please
guide us the process of Apache incubating release, is there any doc to
follow. Thanks a lot!

Best regards
Saisai

On Tue, Aug 1, 2017 at 11:34 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> OK, thanks! Let's get this merged then prepare to the first Apache release.
>
> On Tue, Aug 1, 2017 at 11:30 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:
>
>> There's two open PRs on the livy repo with a follow-up PR on the website
>> repo:
>>
>> https://github.com/apache/incubator-livy/pull/25 <- Web UI Log Page
>> https://github.com/apache/incubator-livy/pull/26 <- Ability to build
>> Livy Docs
>> https://github.com/apache/incubator-livy-website/pull/7 <- Add Livy Docs
>> to Website (it may actually be better to update this and merge it after the
>> release)
>>
>> Once the Log Page PR is merged the basic Web UI is complete, the
>> remaining UI JIRAs are feature adds and tests that can come in the next
>> release.
>>
>> *Alex Bozarth*
>> Software Engineer
>> Spark Technology Center
>> --
>> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
>> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>>
>>
>> 505 Howard Street
>> San Francisco, CA 94105
>> United States
>>
>>
>>
>> [image: Inactive hide details for Saisai Shao ---07/31/2017 07:12:11
>> PM---Hi Alex, can you please list the JIRAs for UI related works y]Saisai
>> Shao ---07/31/2017 07:12:11 PM---Hi Alex, can you please list the JIRAs for
>> UI related works you want to merge in 0.4 release?
>>
>> From: Saisai Shao <sai.sai.s...@gmail.com>
>> To: dev@livy.incubator.apache.org
>> Date: 07/31/2017 07:12 PM
>> Subject: Re: To release a first Apache version Livy
>> --
>>
>>
>>
>> Hi Alex, can you please list the JIRAs for UI related works you want to
>> merge in 0.4 release?
>>
>> Thanks
>>
>> On Sat, Jul 29, 2017 at 7:59 AM, Alex Bozarth <ajboz...@us.ibm.com>
>> wrote:
>>
>> > After some tweaking and opening of PRs today, I'll change my vote to a
>> +1
>> > with the inclusion of my currently open PRs.
>> >
>> >
>> > *Alex Bozarth*
>> > Software Engineer
>> > Spark Technology Center
>> > --
>> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
>> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>> >
>> >
>> > 505 Howard Street
>> > San Francisco, CA 94105
>> > United States
>> >
>> >
>> >
>> > [image: Inactive hide details for "Alex Bozarth" ---07/28/2017 01:01:16
>> > PM---I'm a +0 on this, I think we should get in a release soon,]"Alex
>> > Bozarth" ---07/28/2017 01:01:16 PM---I'm a +0 on this, I think we should
>> > get in a release soon, but I'm not sure if we should wait until
>> >
>> > From: "Alex Bozarth" <ajboz...@us.ibm.com>
>> > To: dev@livy.incubator.apache.org
>> > Date: 07/28/2017 01:01 PM
>> > Subject: Re: To release a first Apache version Livy
>> > --
>> >
>> >
>> >
>> > I'm a +0 on this, I think we should get in a release soon, but I'm not
>> > sure if we should wait until the Web UI and Documentation are finished
>> > (tracking in *LIVY-87* <https://issues.cloudera.org/browse/LIVY-87> and
>> > *LIVY-384* <https://issues.cloudera.org/browse/LIVY-384>). If others
>> are
>> > fine releasing with these feature partially complete then I'll be okay
>> as
>> > well.
>> >
>> > *Alex Bozarth*
>> > Software Engineer
>> > Spark Technology Center
>> > --
>> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
>> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>> >
>> >
>> > 505 Howard Street
>> > San Francisco, CA 94105
>> > United States
>> >
>> >
>> >
>> > Jeff Zhang ---07/28/2017 01:12:24 AM---+1 for making the first apache
>> > release. Saisai Shao <sai.sai.s...@gmail.com>于2017年7月28日周五 下午3:42写道:
>> >
>> > From: Jeff Zhang <zjf...@gmail.com>
>> > To: dev@livy.incubator.apache.org
>> > Date: 07/28/2017 01:12 AM
>> > Subject: Re: To release a first Apache version Livy
>> > --
>> >
>> >
>> >
>> > +1 for making the first apache release.
>> >
>> >
>> > Saisai Shao <sai.sai.s...@gmail.com>于2017年7月28日周五 下午3:42写道:
>> >
>> > > Hi Team,
>> > >
>> > > We have already done most of the migration works to Apache, I think it
>> > > would be better to have a first Apache release based on the current
>> code.
>> > > What do you think?
>> > >
>> > > Thanks
>> > > Saisai
>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>


[jira] [Commented] (SPARK-21570) File __spark_libs__XXX.zip does not exist on networked file system w/ yarn

2017-08-02 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112036#comment-16112036
 ] 

Saisai Shao commented on SPARK-21570:
-

Sorry I'm not familiar with NFS/Lustre FS, does this kind of network FS has a 
special scheme in hadoop like "hdfs://" or "wasb://", or they just represented 
as "file://" to treat them like local FS?

> File __spark_libs__XXX.zip does not exist on networked file system w/ yarn
> --
>
> Key: SPARK-21570
> URL: https://issues.apache.org/jira/browse/SPARK-21570
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Albert Chu
>
> I have a set of scripts that run Spark with data in a networked file system.  
> One of my unit tests to make sure things don't break between Spark releases 
> is to simply run a word count (via org.apache.spark.examples.JavaWordCount) 
> on a file in the networked file system.  This test broke with Spark 2.2.0 
> when I use yarn to launch the job (using the spark standalone scheduler 
> things still work).  I'm currently using Hadoop 2.7.0.  I get the following 
> error:
> {noformat}
> Diagnostics: File 
> file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does not exist
> java.io.FileNotFoundException: File 
> file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> While debugging, I sat and watched the directory and did see that 
> /p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does show up at some point.
> Wondering if it's possible something racy was introduced.  Nothing in the 
> Spark 2.2.0 release notes suggests any type of configuration change that 
> needs to be done.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21570) File __spark_libs__XXX.zip does not exist on networked file system w/ yarn

2017-08-02 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110617#comment-16110617
 ] 

Saisai Shao commented on SPARK-21570:
-

This __spark_libs_xxx.zip is created by Spark on yarn to zip spark dependencies 
and upload to HDFS, yarn will download it from HDFS and add to local dir, this 
will be used for Spark AM and executor launch classpath.

Usually there's no issue whether you're using HDFS/S3/WASB, as long as this zip 
file can be reach by NM across the cluster. I'm wondering if you're using some 
different ways to start Spark on yarn application, or your cluster is a little 
different from normal setup. I think it should not be an issue of Spark, mostly 
it is a setup issue.

Can you please list the steps to reproduce this issue? I'm not quite following 
your description above.

> File __spark_libs__XXX.zip does not exist on networked file system w/ yarn
> --
>
> Key: SPARK-21570
> URL: https://issues.apache.org/jira/browse/SPARK-21570
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Albert Chu
>
> I have a set of scripts that run Spark with data in a networked file system.  
> One of my unit tests to make sure things don't break between Spark releases 
> is to simply run a word count (via org.apache.spark.examples.JavaWordCount) 
> on a file in the networked file system.  This test broke with Spark 2.2.0 
> when I use yarn to launch the job (using the spark standalone scheduler 
> things still work).  I'm currently using Hadoop 2.7.0.  I get the following 
> error:
> {noformat}
> Diagnostics: File 
> file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does not exist
> java.io.FileNotFoundException: File 
> file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> While debugging, I sat and watched the directory and did see that 
> /p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip
>  does show up at some point.
> Wondering if it's possible something racy was introduced.  Nothing in the 
> Spark 2.2.0 release notes suggests any type of configuration change that 
> needs to be done.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: To release a first Apache version Livy

2017-07-31 Thread Saisai Shao
OK, thanks! Let's get this merged then prepare to the first Apache release.

On Tue, Aug 1, 2017 at 11:30 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> There's two open PRs on the livy repo with a follow-up PR on the website
> repo:
>
> https://github.com/apache/incubator-livy/pull/25 <- Web UI Log Page
> https://github.com/apache/incubator-livy/pull/26 <- Ability to build Livy
> Docs
> https://github.com/apache/incubator-livy-website/pull/7 <- Add Livy Docs
> to Website (it may actually be better to update this and merge it after the
> release)
>
> Once the Log Page PR is merged the basic Web UI is complete, the remaining
> UI JIRAs are feature adds and tests that can come in the next release.
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Saisai Shao ---07/31/2017 07:12:11
> PM---Hi Alex, can you please list the JIRAs for UI related works y]Saisai
> Shao ---07/31/2017 07:12:11 PM---Hi Alex, can you please list the JIRAs for
> UI related works you want to merge in 0.4 release?
>
> From: Saisai Shao <sai.sai.s...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 07/31/2017 07:12 PM
> Subject: Re: To release a first Apache version Livy
> --
>
>
>
> Hi Alex, can you please list the JIRAs for UI related works you want to
> merge in 0.4 release?
>
> Thanks
>
> On Sat, Jul 29, 2017 at 7:59 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:
>
> > After some tweaking and opening of PRs today, I'll change my vote to a +1
> > with the inclusion of my currently open PRs.
> >
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > [image: Inactive hide details for "Alex Bozarth" ---07/28/2017 01:01:16
> > PM---I'm a +0 on this, I think we should get in a release soon,]"Alex
> > Bozarth" ---07/28/2017 01:01:16 PM---I'm a +0 on this, I think we should
> > get in a release soon, but I'm not sure if we should wait until
> >
> > From: "Alex Bozarth" <ajboz...@us.ibm.com>
> > To: dev@livy.incubator.apache.org
> > Date: 07/28/2017 01:01 PM
> > Subject: Re: To release a first Apache version Livy
> > --
> >
> >
> >
> > I'm a +0 on this, I think we should get in a release soon, but I'm not
> > sure if we should wait until the Web UI and Documentation are finished
> > (tracking in *LIVY-87* <https://issues.cloudera.org/browse/LIVY-87> and
> > *LIVY-384* <https://issues.cloudera.org/browse/LIVY-384>). If others are
> > fine releasing with these feature partially complete then I'll be okay as
> > well.
> >
> > *Alex Bozarth*
> > Software Engineer
> > Spark Technology Center
> > --
> > *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> > *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
> >
> >
> > 505 Howard Street
> > San Francisco, CA 94105
> > United States
> >
> >
> >
> > Jeff Zhang ---07/28/2017 01:12:24 AM---+1 for making the first apache
> > release. Saisai Shao <sai.sai.s...@gmail.com>于2017年7月28日周五 下午3:42写道:
> >
> > From: Jeff Zhang <zjf...@gmail.com>
> > To: dev@livy.incubator.apache.org
> > Date: 07/28/2017 01:12 AM
> > Subject: Re: To release a first Apache version Livy
> > --
> >
> >
> >
> > +1 for making the first apache release.
> >
> >
> > Saisai Shao <sai.sai.s...@gmail.com>于2017年7月28日周五 下午3:42写道:
> >
> > > Hi Team,
> > >
> > > We have already done most of the migration works to Apache, I think it
> > > would be better to have a first Apache release based on the current
> code.
> > > What do you think?
> > >
> > > Thanks
> > > Saisai
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>


Re: To release a first Apache version Livy

2017-07-31 Thread Saisai Shao
Hi Alex, can you please list the JIRAs for UI related works you want to
merge in 0.4 release?

Thanks

On Sat, Jul 29, 2017 at 7:59 AM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> After some tweaking and opening of PRs today, I'll change my vote to a +1
> with the inclusion of my currently open PRs.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for "Alex Bozarth" ---07/28/2017 01:01:16
> PM---I'm a +0 on this, I think we should get in a release soon,]"Alex
> Bozarth" ---07/28/2017 01:01:16 PM---I'm a +0 on this, I think we should
> get in a release soon, but I'm not sure if we should wait until
>
> From: "Alex Bozarth" <ajboz...@us.ibm.com>
> To: dev@livy.incubator.apache.org
> Date: 07/28/2017 01:01 PM
> Subject: Re: To release a first Apache version Livy
> --
>
>
>
> I'm a +0 on this, I think we should get in a release soon, but I'm not
> sure if we should wait until the Web UI and Documentation are finished
> (tracking in *LIVY-87* <https://issues.cloudera.org/browse/LIVY-87> and
> *LIVY-384* <https://issues.cloudera.org/browse/LIVY-384>). If others are
> fine releasing with these feature partially complete then I'll be okay as
> well.
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> Jeff Zhang ---07/28/2017 01:12:24 AM---+1 for making the first apache
> release. Saisai Shao <sai.sai.s...@gmail.com>于2017年7月28日周五 下午3:42写道:
>
> From: Jeff Zhang <zjf...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 07/28/2017 01:12 AM
> Subject: Re: To release a first Apache version Livy
> --
>
>
>
> +1 for making the first apache release.
>
>
> Saisai Shao <sai.sai.s...@gmail.com>于2017年7月28日周五 下午3:42写道:
>
> > Hi Team,
> >
> > We have already done most of the migration works to Apache, I think it
> > would be better to have a first Apache release based on the current code.
> > What do you think?
> >
> > Thanks
> > Saisai
> >
>
>
>
>
>
>


Re: Need input: Location of Livy Documentation

2017-07-25 Thread Saisai Shao
I'm in favor of store the docs in main repo, while put seldom changed
frameworks in website repo. I think you already mentioned the reason here (as
the Documentation grows and changes from version to version it would be
better to store it in the more regularly updated repo).

What I saw is that Spark put all the docs with version separated in here (
https://github.com/apache/spark-website/tree/asf-site/site/docs). So that
user could check docs with right version. I think in Livy we could also
maintain a such folder to put each release version's doc here (either
manually or by script). As for now (because we don't have a release) we
could create an "unreleased" folder to maintain docs in master branch, also
sync the docs periodically if anything changed.


On Wed, Jul 26, 2017 at 4:48 AM, Alex Bozarth  wrote:

>
>
> Hey Team
>
> Jerry and I have been discussing (on my recent PR) where we should keep the
> Livy Documentation files as we move them out of the README. My PR
> originally moved the two Doc files (on the REST and Programmatic APIs) into
> the livy website repos so we can display them via the website. Jerry
> proposed that we keep them in the main livy repo like how Spark does it.
> Personally given we have just two files I think leaving them in the website
> repo is simplier for us, but I also understand that as the Documentation
> grows and changes from version to version it would be better to store it in
> the more regularly updated repo.
>
> What are everyone's opinions on where the Docs should be stored? If we
> decide to store them in the main repo does anyone have pointers on how to
> simply and cleanly sync them to the website for viewing. You can see my
> Documenation and README update PRs here:
> https://github.com/apache/incubator-livy/pull/21
> https://github.com/apache/incubator-livy-website/pull/5
>
>
>  Alex Bozarth
>  Software Engineer
>  Spark Technology Center
>
>
>
>
>  E-mail: ajboz...@us.ibm.com
>  GitHub: github.com/ajbozarth
>505
> Howard Street
>  San
> Francisco, CA 94105
>
>  United States
>
>
>
>
>
>


[jira] [Commented] (SPARK-21521) History service requires user is in any group

2017-07-25 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099629#comment-16099629
 ] 

Saisai Shao commented on SPARK-21521:
-

I I think we should have a special logics to treat special users like "root", 
in the current logics we don't have such logics and treat "root" as a normal 
user, that's why it will be failed in this case.

> History service requires user is in any group
> -
>
> Key: SPARK-21521
> URL: https://issues.apache.org/jira/browse/SPARK-21521
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Adrian Bridgett
>
> (Regression cf. 2.0.2)
> We run spark as several users, these write to the history location where the 
> files are saved as those users with permissions of 770 (this is hardcoded in 
> EventLoggingListener.scala).
> The history service runs as root so that it has permissions on these files 
> (see https://spark.apache.org/docs/latest/security.html).
> This worked fine in v2.0.2, however in v2.2.0 the events are being skipped 
> unless I add the root user into each users group at which point they are seen.
> We currently have all acls configuration unset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21521) History service requires user is in any group

2017-07-24 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099392#comment-16099392
 ] 

Saisai Shao commented on SPARK-21521:
-

[~vanzin], I guess so, in the current logics of {{checkAccessPermission}} we 
don't differentiate special user, so user "root" here is still just a normal 
user. Let me verify it.

> History service requires user is in any group
> -
>
> Key: SPARK-21521
> URL: https://issues.apache.org/jira/browse/SPARK-21521
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Adrian Bridgett
>
> (Regression cf. 2.0.2)
> We run spark as several users, these write to the history location where the 
> files are saved as those users with permissions of 770 (this is hardcoded in 
> EventLoggingListener.scala).
> The history service runs as root so that it has permissions on these files 
> (see https://spark.apache.org/docs/latest/security.html).
> This worked fine in v2.0.2, however in v2.2.0 the events are being skipped 
> unless I add the root user into each users group at which point they are seen.
> We currently have all acls configuration unset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Input file as an argument og a Spark code

2017-07-24 Thread Saisai Shao
I think you have to make this csv file accessible from Spark cluster,
putting to HDFS is one possible solution.

On Tue, Jul 25, 2017 at 1:26 AM, Joaquín Silva  wrote:

> Hello,
>
>
>
> I'm building a BASH program (using Curl)  that should run a Spark code
> remotely using Livy. But one of the code argument  is a CSV file, how can I
> make that spark reads this file?. The file is going to be in client side,
> not in the Spark cluster machines.
>
>
>
> Regards,
>
>
>
> Joaquín Silva
>
>
>


Re: When Livy JIRA system will be ready?

2017-07-24 Thread Saisai Shao
Oh, thanks Alex!

On Mon, Jul 24, 2017 at 2:56 PM, Alex Bozarth <ajboz...@us.ibm.com> wrote:

> You can follow the progress on the INFRA JIRA: https://issues.apache.org/
> jira/browse/INFRA-14469
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* <ajboz...@us.ibm.com>
> *GitHub: **github.com/ajbozarth* <https://github.com/ajbozarth>
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Saisai Shao ---07/23/2017 11:32:03
> PM---Hi team, May I ask when Livy JIRA system in Apache will be re]Saisai
> Shao ---07/23/2017 11:32:03 PM---Hi team, May I ask when Livy JIRA system
> in Apache will be ready? Currently most of
>
> From: Saisai Shao <sai.sai.s...@gmail.com>
> To: dev@livy.incubator.apache.org
> Date: 07/23/2017 11:32 PM
> Subject: When Livy JIRA system will be ready?
> --
>
>
>
> Hi team,
>
> May I ask when Livy JIRA system in Apache will be ready? Currently most of
> the transition works are done except JIRA, contributors still need to
> create JIRAs on cloudera.org (which has very limited permission) and
> submit
> code to Apache.
>
> Thanks
> Saisai
>
>
>
>


[jira] [Commented] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full

2017-07-20 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095447#comment-16095447
 ] 

Saisai Shao commented on SPARK-21460:
-

I think this is basically a ListenerBus issue, not a dynamic allocation issue. 
Because ExecutorAllocationManager will register a listener and rely on listener 
to decide increase or decrease executors. Now because of failure of 
ListenerBus, then the listener in ExecutorAllocationManager cannot be worked 
and fails to decrease the executors as mentioned in your description.

> Spark dynamic allocation breaks when ListenerBus event queue runs full
> --
>
> Key: SPARK-21460
> URL: https://issues.apache.org/jira/browse/SPARK-21460
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, YARN
>Affects Versions: 2.0.0, 2.0.2, 2.1.0, 2.1.1, 2.2.0
> Environment: Spark 2.1 
> Hadoop 2.6
>Reporter: Ruslan Dautkhanov
>Priority: Critical
>  Labels: dynamic_allocation, performance, scheduler, yarn
>
> When ListenerBus event queue runs full, spark dynamic allocation stops 
> working - Spark fails to shrink number of executors when there are no active 
> jobs (Spark driver "thinks" there are active jobs since it didn't capture 
> when they finished) .
> ps. What's worse it also makes Spark flood YARN RM with reservation requests, 
> so YARN preemption doesn't function properly too (we're on Spark 2.1 / Hadoop 
> 2.6). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Commit style best practices, was Re: incubator-livy-website git commit: Fix bug in merge_livy_pr.py because incubator-livy-website repo has no branch it's name started with "branch-"

2017-07-20 Thread Saisai Shao
Sorry about it. Basically because I don't know whether JIRA is necessary
for incubator-livy-website repo, also where to create JIRA.

On Thu, Jul 20, 2017 at 1:20 PM, Alex Bozarth  wrote:

> +1. I think we've been using close to this format, but with "LIVY-XXX."
> instead of "[LIVY-XXX]". I can include a copy of this Contributing section
> in my next update to the livy website if we want, I almost added something
> similar in my first update, but decided not to since it hadn't been
> discussed yet.
>
>
> *Alex Bozarth*
> Software Engineer
> Spark Technology Center
> --
> *E-mail:* *ajboz...@us.ibm.com* 
> *GitHub: **github.com/ajbozarth* 
>
>
> 505 Howard Street
> San Francisco, CA 94105
> United States
>
>
>
> [image: Inactive hide details for Marcelo Vanzin ---07/20/2017 01:12:09
> PM---+1. Bad commit messages are one of my pet peeves. Although]Marcelo
> Vanzin ---07/20/2017 01:12:09 PM---+1. Bad commit messages are one of my
> pet peeves. Although I like periods at the end of sentences.
>
> From: Marcelo Vanzin 
> To: dev@livy.incubator.apache.org
> Date: 07/20/2017 01:12 PM
> Subject: Re: Commit style best practices, was Re: incubator-livy-website
> git commit: Fix bug in merge_livy_pr.py because incubator-livy-website repo
> has no branch it's name started with "branch-"
> --
>
>
>
> +1. Bad commit messages are one of my pet peeves. Although I like
> periods at the end of sentences.
>
> On Thu, Jul 20, 2017 at 1:09 PM, Luciano Resende 
> wrote:
> > Could we try to follow some best practices on PR title, commit message,
> etc?
> >
> > Some info from: http://bahir.apache.org/contributing/#Creating+a+Pull+
> > Request
> >
> > - Open a pull request against the master branch
> >
> >- The PR title should be of the form [LIVY-] Title, where
> LIVY-
> >is the relevant JIRA number and Title may be the JIRA’s title or a
> more
> >specific title describing the PR itself.
> >- If the pull request is still a work in progress, and so is not ready
> >to be merged, but needs to be pushed to Github to facilitate review,
> then
> >add [WIP] after the component.
> >- For website work, a JIRA is not required
> >
> > - Follow The 7 rules for a great commit message
> > 
> >
> >- Separate subject from body with a blank line
> >- Limit the subject line to 50 characters
> >- Capitalize the subject line
> >- Do not end the subject line with a period
> >- Use the imperative mood in the subject line
> >- Wrap the body at 72 characters
> >- Use the body to explain what and why vs. how
> >
> > Below is an example of a good commit message
> >
> > [LIVY-001] Performance enhancements for decision tree
> >
> > Generate Matrix with random values through local memory
> > if there is sufficient memory.
> >
> >
> >
> > Thoughts ?
> >
> > On Thu, Jul 20, 2017 at 1:03 PM,  wrote:
> >
> >> Repository: incubator-livy-website
> >> Updated Branches:
> >>   refs/heads/master 27348bab6 -> 572b37b1e
> >>
> >>
> >> Fix bug in merge_livy_pr.py because incubator-livy-website repo has no
> >> branch it's name started with "branch-"
> >>
> >>
> >> Project: http://git-wip-us.apache.org/repos/asf/incubator-livy-websit
> >> e/repo
> >> Commit: http://git-wip-us.apache.org/repos/asf/incubator-livy-websit
> >> e/commit/572b37b1
> >> Tree: http://git-wip-us.apache.org/repos/asf/incubator-livy-websit
> >> e/tree/572b37b1
> >> Diff: http://git-wip-us.apache.org/repos/asf/incubator-livy-websit
> >> e/diff/572b37b1
> >>
> >> Branch: refs/heads/master
> >> Commit: 572b37b1efc2a2947272790261b6ba023ec53b74
> >> Parents: 27348ba
> >> Author: jerryshao 
> >> Authored: Thu Jul 20 13:02:23 2017 -0700
> >> Committer: jerryshao 
> >> Committed: Thu Jul 20 13:03:22 2017 -0700
> >>
> >> --
> >>  merge_livy_pr.py | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> --
> >>
> >>
> >> http://git-wip-us.apache.org/repos/asf/incubator-livy-websit
> >> e/blob/572b37b1/merge_livy_pr.py
> >> --
> >> diff --git a/merge_livy_pr.py b/merge_livy_pr.py
> >> index b527a29..7296aef 100755
> >> --- a/merge_livy_pr.py
> >> +++ b/merge_livy_pr.py
> >> @@ -359,7 +359,7 @@ def main():
> >>  original_head = get_current_ref()
> >>
> >>  branches = get_json("%s/branches" % GITHUB_API_BASE)
> >> -branch_names = filter(lambda x: x.startswith("branch-"), [x['name']
> >> for x in branches])
> >> +branch_names = [x['name'] for x in branches]
> >>  # Assumes branch names can be sorted lexicographically
> >>  latest_branch = 

[jira] [Created] (SPARK-21475) Change the usage of FileInputStream/OutputStream to Files.newInput/OutputStream in the critical path

2017-07-19 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-21475:
---

 Summary: Change the usage of FileInputStream/OutputStream to 
Files.newInput/OutputStream in the critical path
 Key: SPARK-21475
 URL: https://issues.apache.org/jira/browse/SPARK-21475
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 2.3.0
Reporter: Saisai Shao
Priority: Minor


Java's {{FileInputStream}} and {{FileOutputStream}} overrides {{finalize()}}, 
even this file input/output stream is closed correctly and promptly, it will 
still leave some memory footprints which will get cleaned in Full GC. This will 
introduce two side effects:

1. Lots of memory footprints regarding to Finalizer will be kept in memory and 
this will increase the memory overhead. In our use case of external shuffle 
service, a busy shuffle service will have bunch of this object and potentially 
lead to OOM.
2. The Finalizer will only be called in Full GC, and this will increase the 
overhead of Full GC and lead to long GC pause.

So to fix this potential issue, here propose to use NIO's 
Files#newInput/OutputStream instead in some critical paths like shuffle.

https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21407) Support spnego for ThriftServer thrift/http auth

2017-07-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088151#comment-16088151
 ] 

Saisai Shao commented on SPARK-21407:
-

Sorry Marcelo, haven't noticed your comment. Yes it is the same, I'm helping 
out to get this merged in Apache Spark. Do you have specific concern on this?

> Support spnego for ThriftServer thrift/http auth
> 
>
> Key: SPARK-21407
> URL: https://issues.apache.org/jira/browse/SPARK-21407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> Spark ThriftServer doesn't support spnego auth for thrift/http protocol, this 
> mainly used for knox+thriftserver scenario. Since in HiveServer2 CLIService 
> there already has existing codes to support it. So here copy it to Spark 
> ThriftServer to make it support.
> Related Hive JIRA HIVE-6697.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21411) Failed to get new HDFS delegation tokens in AMCredentialRenewer

2017-07-13 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086777#comment-16086777
 ] 

Saisai Shao commented on SPARK-21411:
-

This issue is introduced by SPARK-20434.

> Failed to get new HDFS delegation tokens in AMCredentialRenewer
> ---
>
> Key: SPARK-21411
> URL: https://issues.apache.org/jira/browse/SPARK-21411
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>    Reporter: Saisai Shao
>
> In the current {{YARNHadoopDelegationTokenManager}}, {{FileSystem}} to which 
> to get tokens are created out of KDC logged UGI, using these {{FileSystem}} 
> to get new tokens will lead to exception. The main is that Spark code trying 
> to get new tokens using non-kerberized UGI, and Hadoop can only offer new 
> tokens in kerberized UGI. To fix this issue, we should lazily create these 
> {{FileSystem}} within KDC logged UGI.
> {noformat}
> WARN AMCredentialRenewer: Failed to write out new credentials to HDFS, will 
> try again in an hour! If this happens too often tasks will fail.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
> can be issued only with kerberos or web authentication
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7087)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:676)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:998)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1498)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1398)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at com.sun.proxy.$Proxy10.getDelegationToken(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:980)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
>   at com.sun.proxy.$Proxy11.getDelegationToken(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1041)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1688)
>   at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:549)
>   at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:527)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2400)
>   at 
> org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider$$anonfun$fetchDelegationTokens$1.apply(HadoopFSDelegationTokenProvider.scala:97)
>   at 
> org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider$$anonfun$fetchDelegationTokens$1.apply(HadoopFSDelegationTokenProvider.scala:95)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
>   at 
> org.apache.spark.deploy.security.Hado

[jira] [Created] (SPARK-21411) Failed to get new HDFS delegation tokens in AMCredentialRenewer

2017-07-13 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-21411:
---

 Summary: Failed to get new HDFS delegation tokens in 
AMCredentialRenewer
 Key: SPARK-21411
 URL: https://issues.apache.org/jira/browse/SPARK-21411
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.2.0
Reporter: Saisai Shao


In the current {{YARNHadoopDelegationTokenManager}}, {{FileSystem}} to which to 
get tokens are created out of KDC logged UGI, using these {{FileSystem}} to get 
new tokens will lead to exception. The main is that Spark code trying to get 
new tokens using non-kerberized UGI, and Hadoop can only offer new tokens in 
kerberized UGI. To fix this issue, we should lazily create these {{FileSystem}} 
within KDC logged UGI.

{noformat}
WARN AMCredentialRenewer: Failed to write out new credentials to HDFS, will try 
again in an hour! If this happens too often tasks will fail.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
can be issued only with kerberos or web authentication
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7087)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:676)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:998)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy10.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:980)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy11.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1041)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1688)
at 
org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:549)
at 
org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:527)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2400)
at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider$$anonfun$fetchDelegationTokens$1.apply(HadoopFSDelegationTokenProvider.scala:97)
at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider$$anonfun$fetchDelegationTokens$1.apply(HadoopFSDelegationTokenProvider.scala:95)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.fetchDelegationTokens(HadoopFSDelegationTokenProvider.scala:95)
at 
org.apache.spark.deploy.security.HadoopFSDelegationTokenProvider.obtainDelegationTokens(HadoopFSDelegationTokenProvider.scala:46)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anonfun$obtainDelegationTokens$2.apply(HadoopDelegationTokenManager.scala:111)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager$$anonfun$obtainDelegationTokens$2.apply(HadoopDelegationTokenManager.scala:109)
at 
scala.collection.TraversableLike$$anonfun

[jira] [Commented] (SPARK-21398) Data on Rest end point is not updating after first fetch

2017-07-13 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086637#comment-16086637
 ] 

Saisai Shao commented on SPARK-21398:
-

Can you please elaborate more about the issue, which rest api are you using, 
what is the problem and what is expected?

> Data on Rest end point is not updating after first fetch
> 
>
> Key: SPARK-21398
> URL: https://issues.apache.org/jira/browse/SPARK-21398
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Shubham Gupta
>
> I am was fetching data from Rest End Point for Spark application but I 
> observed that I am always getting the same data which I got for the first 
> time I fetched Rest data.This bug is a blocker for me to calculate some 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21407) Support spnego for ThriftServer thrift/http auth

2017-07-13 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21407:

Description: 
Spark ThriftServer doesn't support spnego auth for thrift/http protocol, this 
mainly used for knox+thriftserver scenario. Since in HiveServer2 CLIService 
there already has existing codes to support it. So here copy it to Spark 
ThriftServer to make it support.

Related Hive JIRA HIVE-6697.

  was:Spark ThriftServer doesn't support spnego auth for thrift/http protocol, 
this mainly used for knox+thriftserver scenario. Since in HiveServer2 
CLIService there already has existing codes to support it. So here copy it to 
Spark ThriftServer to make it support.


> Support spnego for ThriftServer thrift/http auth
> 
>
> Key: SPARK-21407
> URL: https://issues.apache.org/jira/browse/SPARK-21407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> Spark ThriftServer doesn't support spnego auth for thrift/http protocol, this 
> mainly used for knox+thriftserver scenario. Since in HiveServer2 CLIService 
> there already has existing codes to support it. So here copy it to Spark 
> ThriftServer to make it support.
> Related Hive JIRA HIVE-6697.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21407) Support spnego for ThriftServer thrift/http auth

2017-07-13 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21407:

Summary: Support spnego for ThriftServer thrift/http auth  (was: Support 
spnego for ThriftServer http auth)

> Support spnego for ThriftServer thrift/http auth
> 
>
> Key: SPARK-21407
> URL: https://issues.apache.org/jira/browse/SPARK-21407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> Spark ThriftServer doesn't support spnego auth for thrift/http protocol, this 
> mainly used for knox+thriftserver scenario. Since in HiveServer2 CLIService 
> there already has existing codes to support it. So here copy it to Spark 
> ThriftServer to make it support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21407) Support spnego for ThriftServer http auth

2017-07-13 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-21407:
---

 Summary: Support spnego for ThriftServer http auth
 Key: SPARK-21407
 URL: https://issues.apache.org/jira/browse/SPARK-21407
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Saisai Shao
Priority: Minor


Spark ThriftServer doesn't support spnego auth for thrift/http protocol, this 
mainly used for knox+thriftserver scenario. Since in HiveServer2 CLIService 
there already has existing codes to support it. So here copy it to Spark 
ThriftServer to make it support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21377) Jars specified with --jars or --packages are not added into AM's system classpath

2017-07-12 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Summary: Jars specified with --jars or --packages are not added into AM's 
system classpath  (was: Add a new configuration to extend AM classpath in yarn 
client mode)

> Jars specified with --jars or --packages are not added into AM's system 
> classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> In this issue we have a long running Spark application with secure HBase, 
> which requires {{HBaseCredentialProvider}} to get tokens periodically, we 
> specify HBase related jars with {{\--packages}}, but these dependencies are 
> not added into AM classpath, so when {{HBaseCredentialProvider}} tries to 
> initialize HBase connections to get tokens, it will be failed.
> Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
> added into AM classpath, the only way to extend AM classpath is to use 
> "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
> So here we should figure out a solution  either to put these dependencies to 
> AM classpath or to extend AM classpath with correct configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-12 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084460#comment-16084460
 ] 

Saisai Shao edited comment on SPARK-21376 at 7/12/17 6:31 PM:
--

I'm referrring to o.a.s.deploy.yarn.Client this class, it will monitor yarn 
application and try to delete staging files when application is finished.


was (Author: jerryshao):
I'm referrring to o.a.s.deploy.yarn.Client this class, it will monitoring yarn 
application and try to delete staging files when application is finished.

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-12 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084460#comment-16084460
 ] 

Saisai Shao commented on SPARK-21376:
-

I'm referrring to o.a.s.deploy.yarn.Client this class, it will monitoring yarn 
application and try to delete staging files when application is finished.

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-12 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084270#comment-16084270
 ] 

Saisai Shao commented on SPARK-21376:
-

Hi [~tgraves], it is the local yarn launcher process which will launch Spark 
application on yarn cluster. The problem here is that local launcher process 
will always keep the initial token and not get renewed, so when application is 
killed then local launcher process will try to delete the staging files, and 
using this initial token will be failed in long running scenario.

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Affects Version/s: (was: 2.1.0)
   2.2.0

> Jars pulled from "--packages" are not added into AM classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21377) Add a new configuration to extend AM classpath in yarn client mode

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083081#comment-16083081
 ] 

Saisai Shao edited comment on SPARK-21377 at 7/11/17 10:13 PM:
---

My original purpose is to add jars uploaded by distributed cache to AM 
classpath with "$\{PWD\}/*" in AM container setup both in client and cluster 
mode, but I guess it may affect the use of "spark.driver.userClassPathFirst" in 
cluster mode or contaminate the existing AM classpath, since now jars will also 
be existed in AM classpath.

So your suggestion is that we use another context loader to load these jars, 
and specifically used for ServiceCredentialProvider, am I right?


was (Author: jerryshao):
My original purpose is to add jars uploaded by distributed cache to AM 
classpath with "${PWD}/*" in AM container setup both in client and cluster 
mode, but I guess it may affect the use of "spark.driver.userClassPathFirst" in 
cluster mode or contaminate the existing AM classpath, since now jars will also 
be existed in AM classpath.

So your suggestion is that we use another context loader to load these jars, 
and specifically used for ServiceCredentialProvider, am I right?

> Add a new configuration to extend AM classpath in yarn client mode
> --
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> In this issue we have a long running Spark application with secure HBase, 
> which requires {{HBaseCredentialProvider}} to get tokens periodically, we 
> specify HBase related jars with {{\--packages}}, but these dependencies are 
> not added into AM classpath, so when {{HBaseCredentialProvider}} tries to 
> initialize HBase connections to get tokens, it will be failed.
> Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
> added into AM classpath, the only way to extend AM classpath is to use 
> "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
> So here we should figure out a solution  either to put these dependencies to 
> AM classpath or to extend AM classpath with correct configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21377) Add a new configuration to extend AM classpath in yarn client mode

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083081#comment-16083081
 ] 

Saisai Shao commented on SPARK-21377:
-

My original purpose is to add jars uploaded by distributed cache to AM 
classpath with "${PWD}/*" in AM container setup both in client and cluster 
mode, but I guess it may affect the use of "spark.driver.userClassPathFirst" in 
cluster mode or contaminate the existing AM classpath, since now jars will also 
be existed in AM classpath.

So your suggestion is that we use another context loader to load these jars, 
and specifically used for ServiceCredentialProvider, am I right?

> Add a new configuration to extend AM classpath in yarn client mode
> --
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> In this issue we have a long running Spark application with secure HBase, 
> which requires {{HBaseCredentialProvider}} to get tokens periodically, we 
> specify HBase related jars with {{\--packages}}, but these dependencies are 
> not added into AM classpath, so when {{HBaseCredentialProvider}} tries to 
> initialize HBase connections to get tokens, it will be failed.
> Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
> added into AM classpath, the only way to extend AM classpath is to use 
> "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
> So here we should figure out a solution  either to put these dependencies to 
> AM classpath or to extend AM classpath with correct configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21377) Add a new configuration to extend AM classpath in yarn client mode

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083060#comment-16083060
 ] 

Saisai Shao commented on SPARK-21377:
-

Thanks [~vanzin] for your comment.

Your comment is correct, specifying {{\--packages}} will not add jars to AM, my 
original thought is to main jar and secondary jars automatically into AM 
classpath, but this will break the usage of "spark.driver.userClassPathFirst". 
So my proposal is to manually specify AM extra classpath with 
"spark.yarn.am.extraClassPath" manually, for example specifying HBase classpath 
with this configuration. This requires HBase dependencies existed in cluster, 
but it may not impact user application's classpath.

> Add a new configuration to extend AM classpath in yarn client mode
> --
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> In this issue we have a long running Spark application with secure HBase, 
> which requires {{HBaseCredentialProvider}} to get tokens periodically, we 
> specify HBase related jars with {{\--packages}}, but these dependencies are 
> not added into AM classpath, so when {{HBaseCredentialProvider}} tries to 
> initialize HBase connections to get tokens, it will be failed.
> Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
> added into AM classpath, the only way to extend AM classpath is to use 
> "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
> So here we should figure out a solution  either to put these dependencies to 
> AM classpath or to extend AM classpath with correct configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21377) Add a new configuration to extend AM classpath in yarn client mode

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083060#comment-16083060
 ] 

Saisai Shao edited comment on SPARK-21377 at 7/11/17 9:54 PM:
--

Thanks [~vanzin] for your comment.

Your comment is correct, specifying {{\--packages}} will not add jars to AM, my 
original thought is to add main jar and secondary jars automatically into AM 
classpath, but this will break the usage of "spark.driver.userClassPathFirst". 
So my proposal is to manually specify AM extra classpath with 
"spark.yarn.am.extraClassPath" manually, for example specifying HBase classpath 
with this configuration. This requires HBase dependencies existed in cluster, 
but it may not impact user application's classpath.


was (Author: jerryshao):
Thanks [~vanzin] for your comment.

Your comment is correct, specifying {{\--packages}} will not add jars to AM, my 
original thought is to main jar and secondary jars automatically into AM 
classpath, but this will break the usage of "spark.driver.userClassPathFirst". 
So my proposal is to manually specify AM extra classpath with 
"spark.yarn.am.extraClassPath" manually, for example specifying HBase classpath 
with this configuration. This requires HBase dependencies existed in cluster, 
but it may not impact user application's classpath.

> Add a new configuration to extend AM classpath in yarn client mode
> --
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> In this issue we have a long running Spark application with secure HBase, 
> which requires {{HBaseCredentialProvider}} to get tokens periodically, we 
> specify HBase related jars with {{\--packages}}, but these dependencies are 
> not added into AM classpath, so when {{HBaseCredentialProvider}} tries to 
> initialize HBase connections to get tokens, it will be failed.
> Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
> added into AM classpath, the only way to extend AM classpath is to use 
> "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
> So here we should figure out a solution  either to put these dependencies to 
> AM classpath or to extend AM classpath with correct configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21377) Add a new configuration to extend AM classpath in yarn client mode

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Description: 
In this issue we have a long running Spark application with secure HBase, which 
requires {{HBaseCredentialProvider}} to get tokens periodically, we specify 
HBase related jars with {{\--packages}}, but these dependencies are not added 
into AM classpath, so when {{HBaseCredentialProvider}} tries to initialize 
HBase connections to get tokens, it will be failed.

Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
added into AM classpath, the only way to extend AM classpath is to use 
"spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.

So here we should figure out a solution  either to put these dependencies to AM 
classpath or to extend AM classpath with correct configuration.

  was:
STR:
* Set below config in spark-default.conf
{code}
spark.yarn.security.credentials.hbase.enabled true
spark.hbase.connector.security.credentials.enabled false{code}
* Set below config in hdfs-site.xml
{code}
'dfs.namenode.delegation.token.max-lifetime':'4320'
'dfs.namenode.delegation.token.renew-interval':'2880' {code}
* Set below config in hbase-site.xml
{code}
'hbase.auth.token.max.lifetime': '2880' {code}
* Run an application with SHC package
{code}
spark-submit  --class 
org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
--master yarn-client --packages  --num-executors 4 --driver-memory 512m 
--executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
--principal x...@xx.com spark-*jar hiveTableInClient 180  {code}

After 8 hours, application fails with below error. 
{code}
17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
xxx/xxx:2181, initiating session
17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired{code}

Here, Jars pulled from "--packages" are not added into AM class path and that's 
the reason why AM cannot get HBase tokens and failed after token expired. 

So here we should figure out a solution  either to put these dependencies to AM 
classpath or to extend AM classpath with jars we wanted.


> Add a new configuration to extend AM classpath in yarn client mode
> --
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> In this issue we have a long running Spark application with secure HBase, 
> which requires {{HBaseCredentialProvider}} to get tokens periodically, we 
> specify HBase related jars with {{\--packages}}, but these dependencies are 
> not added into AM classpath, so when {{HBaseCredentialProvider}} tries to 
> initialize HBase connections to get tokens, it will be failed.
> Currently because jars specified with {{\--jars}} or {{\--packages}} are not 
> added into AM classpath, the only way to extend AM classpath is to use 
> "spark.driver.extraClassPath" which supposed to be used in yarn cluster mode.
> So here we should figure out a solutio

[jira] [Updated] (SPARK-21377) Add a new configuration to extend AM classpath in yarn client mode

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Summary: Add a new configuration to extend AM classpath in yarn client mode 
 (was: Jars pulled from "--packages" are not added into AM classpath)

> Add a new configuration to extend AM classpath in yarn client mode
> --
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 
> So here we should figure out a solution  either to put these dependencies to 
> AM classpath or to extend AM classpath with jars we wanted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Description: 
STR:
* Set below config in spark-default.conf
{code}
spark.yarn.security.credentials.hbase.enabled true
spark.hbase.connector.security.credentials.enabled false{code}
* Set below config in hdfs-site.xml
{code}
'dfs.namenode.delegation.token.max-lifetime':'4320'
'dfs.namenode.delegation.token.renew-interval':'2880' {code}
* Set below config in hbase-site.xml
{code}
'hbase.auth.token.max.lifetime': '2880' {code}
* Run an application with SHC package
{code}
spark-submit  --class 
org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
--master yarn-client --packages  --num-executors 4 --driver-memory 512m 
--executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
--principal x...@xx.com spark-*jar hiveTableInClient 180  {code}

After 8 hours, application fails with below error. 
{code}
17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
xxx/xxx:2181, initiating session
17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired{code}

Here, Jars pulled from "--packages" are not added into AM class path and that's 
the reason why AM cannot get HBase tokens and failed after token expired. 

So here we should figure out a solution  either to put these dependencies to AM 
classpath or to extend AM classpath with jars we wanted.

  was:
STR:
* Set below config in spark-default.conf
{code}
spark.yarn.security.credentials.hbase.enabled true
spark.hbase.connector.security.credentials.enabled false{code}
* Set below config in hdfs-site.xml
{code}
'dfs.namenode.delegation.token.max-lifetime':'4320'
'dfs.namenode.delegation.token.renew-interval':'2880' {code}
* Set below config in hbase-site.xml
{code}
'hbase.auth.token.max.lifetime': '2880' {code}
* Run an application with SHC package
{code}
spark-submit  --class 
org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
--master yarn-client --packages  --num-executors 4 --driver-memory 512m 
--executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
--principal x...@xx.com spark-*jar hiveTableInClient 180  {code}

After 8 hours, application fails with below error. 
{code}
17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
xxx/xxx:2181, initiating session
17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has expired
17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Token has 

[jira] [Commented] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082864#comment-16082864
 ] 

Saisai Shao commented on SPARK-21377:
-

[~srowen] this is a separate issue to SPARK-21376. In this issue we have a long 
running Spark with secure HBase, which requires {{HBaseCredentialProvider}} to 
get tokens periodically, we specify HBase related jars with {{\--packages}}, 
but these dependencies are not added into AM classpath, so when 
{{HBaseCredentialProvider}} tries to get tokens, it will be failed.

> Jars pulled from "--packages" are not added into AM classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082873#comment-16082873
 ] 

Saisai Shao commented on SPARK-21377:
-

SPARK-21376 and here are both security issues, but the root cause and fix is 
different. So that's why we created two JIRAs.

> Jars pulled from "--packages" are not added into AM classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Priority: Minor  (was: Major)

> Jars pulled from "--packages" are not added into AM classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21376:

Priority: Minor  (was: Major)

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Yesha Vora
>Priority: Minor
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21377:

Component/s: (was: Spark Core)
 YARN

> Jars pulled from "--packages" are not added into AM classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21377) Jars pulled from "--packages" are not added into AM classpath

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082864#comment-16082864
 ] 

Saisai Shao edited comment on SPARK-21377 at 7/11/17 7:58 PM:
--

[~srowen] this is a separate issue to SPARK-21376. In this issue we have a long 
running Spark application with secure HBase, which requires 
{{HBaseCredentialProvider}} to get tokens periodically, we specify HBase 
related jars with {{\--packages}}, but these dependencies are not added into AM 
classpath, so when {{HBaseCredentialProvider}} tries to initialize HBase 
connections to get tokens, it will be failed.


was (Author: jerryshao):
[~srowen] this is a separate issue to SPARK-21376. In this issue we have a long 
running Spark with secure HBase, which requires {{HBaseCredentialProvider}} to 
get tokens periodically, we specify HBase related jars with {{\--packages}}, 
but these dependencies are not added into AM classpath, so when 
{{HBaseCredentialProvider}} tries to get tokens, it will be failed.

> Jars pulled from "--packages" are not added into AM classpath
> -
>
> Key: SPARK-21377
> URL: https://issues.apache.org/jira/browse/SPARK-21377
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Set below config in hbase-site.xml
> {code}
> 'hbase.auth.token.max.lifetime': '2880' {code}
> * Run an application with SHC package
> {code}
> spark-submit  --class 
> org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources 
> --master yarn-client --packages  --num-executors 4 --driver-memory 512m 
> --executor-memory 512m --executor-cores 1  --keytab /xxx/user.headless.keytab 
> --principal x...@xx.com spark-*jar hiveTableInClient 180  {code}
> After 8 hours, application fails with below error. 
> {code}
> 17/06/28 06:33:43 INFO ClientCnxn: Opening socket connection to server 
> xxx/xxx:2181. Will not attempt to authenticate using SASL (unknown error)
> 17/06/28 06:33:43 INFO ClientCnxn: Socket connection established to 
> xxx/xxx:2181, initiating session
> 17/06/28 06:33:43 INFO ClientCnxn: Session establishment complete on server 
> xxx/xxx:2181, sessionid = 0x25ced1d3ac20022, negotiated timeout = 9
> 17/06/28 06:33:43 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:45 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:48 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:33:52 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:02 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired
> 17/06/28 06:34:12 WARN AbstractRpcClient: Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Token has expired{code}
> Here, Jars pulled from "--packages" are not added into AM class path and 
> that's the reason why AM cannot get HBase tokens and failed after token 
> expired. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21376:

Affects Version/s: (was: 2.1.0)
   2.2.0
   2.1.1

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082857#comment-16082857
 ] 

Saisai Shao commented on SPARK-21376:
-

I will work on this, thanks [~yeshavora].

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21376) Token is not renewed in yarn client process in cluster mode

2017-07-11 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-21376:

Component/s: (was: Spark Core)
 YARN

> Token is not renewed in yarn client process in cluster mode
> ---
>
> Key: SPARK-21376
> URL: https://issues.apache.org/jira/browse/SPARK-21376
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Yesha Vora
>
> STR:
> * Set below config in spark-default.conf
> {code}
> spark.yarn.security.credentials.hbase.enabled true
> spark.hbase.connector.security.credentials.enabled false{code}
> * Set below config in hdfs-site.xml
> {code}
> 'dfs.namenode.delegation.token.max-lifetime':'4320'
> 'dfs.namenode.delegation.token.renew-interval':'2880' {code}
> * Run HDFSWordcount streaming app in yarn-cluster mode  for 25 hrs. 
> After 25 hours, noticing that HDFS Wordcount job is hitting 
> HDFS_DELEGATION_TOKEN renewal issue. 
> {code}
> 17/06/28 10:49:47 WARN Client: Exception encountered while connecting to the 
> server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> 17/06/28 10:49:47 WARN Client: Failed to cleanup staging dir 
> hdfs://mycluster0/user/hrt_qa/.sparkStaging/application_1498539861056_0015
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 230 for hrt_qa) is expired
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1498){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14151) Propose to refactor and expose Metrics Sink and Source interface

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082508#comment-16082508
 ] 

Saisai Shao edited comment on SPARK-14151 at 7/11/17 5:00 PM:
--

Thanks [~jiangxb] for your response. I'm not sure if it requires a standard 
SPIP for this JIRA, because this is a simple improvement and we just only 
expose APIs, not create new APIs. I would think only big new changes require 
SPIP, there's not so many things I could address in the SPIP. Would like to 
hear other's comment.

{quote}
The purpose of an SPIP is to inform and involve the user community in major 
improvements to the Spark codebase throughout the development process, to 
increase the likelihood that user needs are met.

SPIPs should be used for significant user-facing or cross-cutting changes, not 
small incremental improvements. When in doubt, if a committer thinks a change 
needs an SPIP, it does.
{quote}




was (Author: jerryshao):
Thanks [~jiangxb] for your response. I'm not sure if it requires a standard 
SPIP for this JIRA, because this is a simple improvement and we just only 
expose APIs, not create new APIs. I would think only big new changes require 
SPIP, there's not so many things I could address in the SPIP. Would like to 
hear other's comment.

> Propose to refactor and expose Metrics Sink and Source interface
> 
>
> Key: SPARK-14151
> URL: https://issues.apache.org/jira/browse/SPARK-14151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>    Reporter: Saisai Shao
>Priority: Minor
>
> MetricsSystem is designed for plug-in different sources and sinks, user could 
> write their own sources and sinks and configured through metrics.properties, 
> MetricsSystem will register it through reflection. But current Source and 
> Sink interface is private, which means user cannot create their own sources 
> and sinks unless using the same package.
> So here propose to expose source and sink interface, this will let user build 
> and maintain their own source and sink, alleviate the maintenance overhead of 
> spark codebase. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14151) Propose to refactor and expose Metrics Sink and Source interface

2017-07-11 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082508#comment-16082508
 ] 

Saisai Shao commented on SPARK-14151:
-

Thanks [~jiangxb] for your response. I'm not sure if it requires a standard 
SPIP for this JIRA, because this is a simple improvement and we just only 
expose APIs, not create new APIs. I would think only big new changes require 
SPIP, there's not so many things I could address in the SPIP. Would like to 
hear other's comment.

> Propose to refactor and expose Metrics Sink and Source interface
> 
>
> Key: SPARK-14151
> URL: https://issues.apache.org/jira/browse/SPARK-14151
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>    Reporter: Saisai Shao
>Priority: Minor
>
> MetricsSystem is designed for plug-in different sources and sinks, user could 
> write their own sources and sinks and configured through metrics.properties, 
> MetricsSystem will register it through reflection. But current Source and 
> Sink interface is private, which means user cannot create their own sources 
> and sinks unless using the same package.
> So here propose to expose source and sink interface, this will let user build 
> and maintain their own source and sink, alleviate the maintenance overhead of 
> spark codebase. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21346) Spark does not use SSL for HTTP File Server and Broadcast Server

2017-07-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079407#comment-16079407
 ] 

Saisai Shao edited comment on SPARK-21346 at 7/9/17 1:29 AM:
-

Jars, files can be downloaded from HTTPS if you configured with {{\--jars 
https///xxx.jar}} or {{\--files xxx}}.


was (Author: jerryshao):
Jars, files can be downloaded from HTTPS if you configured with {{--jars 
https///xxx.jar}} or {{--files xxx}}.

> Spark does not use SSL for HTTP File Server and Broadcast Server
> 
>
> Key: SPARK-21346
> URL: https://issues.apache.org/jira/browse/SPARK-21346
> Project: Spark
>  Issue Type: Question
>  Components: Documentation, Spark Core
>Affects Versions: 2.1.1
>Reporter: John
>Priority: Minor
>  Labels: documentation
>
> SecurityManager states that SSL is used to secure HTTP communication for the 
> broadcast and file server. However, the SSLOptions from the SecurityManager 
> only seem to be used by the SparkUI, the WebUI, and the HistoryServer. 
> According to  [Spark-11140|https://issues.apache.org/jira/browse/SPARK-11140] 
> and [Spark-12588|https://issues.apache.org/jira/browse/SPARK-12588], neither 
> the file server nor broadcast use HTTP anymore. It seems that the 
> documentation is inaccurate and that Spark actually uses SASL on the RPC 
> endpoints to secure the file server and broadcast communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21346) Spark does not use SSL for HTTP File Server and Broadcast Server

2017-07-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079407#comment-16079407
 ] 

Saisai Shao commented on SPARK-21346:
-

Jars, files can be downloaded from HTTPS if you configured with {{--jars 
https///xxx.jar}} or {{--files xxx}}.

> Spark does not use SSL for HTTP File Server and Broadcast Server
> 
>
> Key: SPARK-21346
> URL: https://issues.apache.org/jira/browse/SPARK-21346
> Project: Spark
>  Issue Type: Question
>  Components: Documentation, Spark Core
>Affects Versions: 2.1.1
>Reporter: John
>Priority: Minor
>  Labels: documentation
>
> SecurityManager states that SSL is used to secure HTTP communication for the 
> broadcast and file server. However, the SSLOptions from the SecurityManager 
> only seem to be used by the SparkUI, the WebUI, and the HistoryServer. 
> According to  [Spark-11140|https://issues.apache.org/jira/browse/SPARK-11140] 
> and [Spark-12588|https://issues.apache.org/jira/browse/SPARK-12588], neither 
> the file server nor broadcast use HTTP anymore. It seems that the 
> documentation is inaccurate and that Spark actually uses SASL on the RPC 
> endpoints to secure the file server and broadcast communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21334) Fix metrics for external shuffle service

2017-07-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079389#comment-16079389
 ] 

Saisai Shao edited comment on SPARK-21334 at 7/9/17 12:43 AM:
--

Are you using external shuffle service with YARN? If so I think currently it is 
not supported, we could improve the code to make it work (either through Hadoop 
metrics system or Spark's own one).


was (Author: jerryshao):
Are you using external shuffle service with YARN? If so I think currently it 
cannot be supported, we could improve the code to make it work (either through 
Hadoop metrics system or Spark's own one).

> Fix metrics for external shuffle service
> 
>
> Key: SPARK-21334
> URL: https://issues.apache.org/jira/browse/SPARK-21334
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.1.1
>Reporter: Raajay Viswanathan
>  Labels: external-shuffle-service
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SPARK-16405 introduced metrics for external shuffle service. However, as it 
> is currently there are two issues.
> 1. The shuffle service metrics system does not report values ever.
> 2. The current method for determining "blockTransferRate" is incorrect. The 
> entire block is assumed to be transferred once the OpenBlocks message if 
> processed. The actual data fetch from the disk and the succeeding transfer 
> over the wire happens much later when MessageEncoder invokes encode on 
> ChunkFetchSuccess message. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21334) Fix metrics for external shuffle service

2017-07-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079389#comment-16079389
 ] 

Saisai Shao commented on SPARK-21334:
-

Are you using external shuffle service with YARN? If so I think currently it 
cannot be supported, we could improve the code to make it work (either through 
Hadoop metrics system or Spark's own one).

> Fix metrics for external shuffle service
> 
>
> Key: SPARK-21334
> URL: https://issues.apache.org/jira/browse/SPARK-21334
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.1.1
>Reporter: Raajay Viswanathan
>  Labels: external-shuffle-service
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SPARK-16405 introduced metrics for external shuffle service. However, as it 
> is currently there are two issues.
> 1. The shuffle service metrics system does not report values ever.
> 2. The current method for determining "blockTransferRate" is incorrect. The 
> entire block is assumed to be transferred once the OpenBlocks message if 
> processed. The actual data fetch from the disk and the succeeding transfer 
> over the wire happens much later when MessageEncoder invokes encode on 
> ChunkFetchSuccess message. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21346) Spark does not use SSL for HTTP File Server and Broadcast Server

2017-07-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079385#comment-16079385
 ] 

Saisai Shao commented on SPARK-21346:
-

SSL is still used for fetch remote resources from HTTPS.

If you think the doc inaccurate, you could improve the docs by PR.

> Spark does not use SSL for HTTP File Server and Broadcast Server
> 
>
> Key: SPARK-21346
> URL: https://issues.apache.org/jira/browse/SPARK-21346
> Project: Spark
>  Issue Type: Question
>  Components: Documentation, Spark Core
>Affects Versions: 2.1.1
>Reporter: John
>Priority: Minor
>  Labels: documentation
>
> SecurityManager states that SSL is used to secure HTTP communication for the 
> broadcast and file server. However, the SSLOptions from the SecurityManager 
> only seem to be used by the SparkUI, the WebUI, and the HistoryServer. 
> According to  [Spark-11140|https://issues.apache.org/jira/browse/SPARK-11140] 
> and [Spark-12588|https://issues.apache.org/jira/browse/SPARK-12588], neither 
> the file server nor broadcast use HTTP anymore. It seems that the 
> documentation is inaccurate and that Spark actually uses SASL on the RPC 
> endpoints to secure the file server and broadcast communications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21348) Spark history Server load too slow when eventlog is large

2017-07-08 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079381#comment-16079381
 ] 

Saisai Shao commented on SPARK-21348:
-

SPARK-18085 is trying to address this issue.

[~duyanghao] are you trying to address this issue or you just report an issue ? 
If the latter I think you can close this JIRA and follow SPARK-18085.

> Spark history Server load too slow when eventlog is large
> -
>
> Key: SPARK-21348
> URL: https://issues.apache.org/jira/browse/SPARK-21348
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: duyanghao
>Priority: Critical
>
> There is a eventlog file in hdfs as below:
> ```
> -rwxrwx---   3 root supergroup 9634238709 2017-07-08 04:37 
> hdfs://x/shared/spark-logs/x/.lz4
> ```
> and i start a spark history server to loading that eventlog file,but it is 
> too slow to get spark web ui through that spark history server.
> it has requested for several hours but nothing about spark application web ui 
> appears in the browser
> And the cpu usage of spark history server is 105.3%
> total used-memory in host is 9G
> The host has 48 Cores+130G Memory,enough Computing resources for spark 
> history server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Kafka 0.10 with PySpark

2017-07-05 Thread Saisai Shao
Please see the reason in this thread (
https://github.com/apache/spark/pull/14340). It would better to use
structured streaming instead.

So I would like to -1 this patch. I think it's been a mistake to support
> dstream in Python -- yes it satisfies a checkbox and Spark could claim
> there's support for streaming in Python. However, the tooling and maturity
> for working with streaming data (both in Spark and the more broad
> ecosystem) is simply not there. It is a big baggage to maintain, and
> creates a the wrong impression that production streaming jobs can be
> written in Python.
>

On Tue, Jul 4, 2017 at 10:53 PM, Daniel van der Ende <
daniel.vandere...@gmail.com> wrote:

> Hi,
>
> I'm working on integrating some pyspark code with Kafka. We'd like to use
> SSL/TLS, and so want to use Kafka 0.10. Because structured streaming is
> still marked alpha, we'd like to use Spark streaming. On this page,
> however, it indicates that the Kafka 0.10 integration in Spark does not
> support Python (https://spark.apache.org/docs/latest/streaming-kafka-
> integration.html). I've been trying to figure out why, but have not been
> able to find anything. Is there any particular reason for this?
>
> Thanks,
>
> Daniel
>


Re: Scala 2.11 and Hive

2017-07-03 Thread Saisai Shao
Hi Mobin,

Livy supports Scala 2.11 as well as Spark 2.0+. Can you please elaborate
your problem?

Thanks
Jerry

On Sun, Jul 2, 2017 at 9:19 PM, Mobin Ranjbar 
wrote:

> Hi there,
>
>
> I have a problem around using Livy and Apache Spark(Scala 2.10). I have
> Apache Hive that does not support Spark with Scala 2.10. I have to use Hive
> with Spark with Scala 2.11 but Livy does not support it. How can I have
> them together without having two different versions of Spark?
>
>
> Thanks in Advance,
>
>
> Mobin
>


[jira] [Created] (ZEPPELIN-2716) Change default of zeppelin.livy.displayAppInfo to true

2017-07-03 Thread Saisai Shao (JIRA)
Saisai Shao created ZEPPELIN-2716:
-

 Summary: Change default of zeppelin.livy.displayAppInfo to true
 Key: ZEPPELIN-2716
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2716
 Project: Zeppelin
  Issue Type: Improvement
  Components: livy-interpreter
Affects Versions: 0.8.0
Reporter: Saisai Shao
Priority: Minor


Since it is quite useful to expose the application info for user to monitor and 
debug, so propose to enable "zeppelin.livy.displayAppInfo" to true as default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Saisai Shao
Spark running with standalone cluster manager currently doesn't support
accessing security Hadoop. Basically the problem is that standalone mode
Spark doesn't have the facility to distribute delegation tokens.

Currently only Spark on YARN or local mode supports security Hadoop.

Thanks
Jerry

On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong  wrote:

> Hi, all!
>
> I was trying to read from a Kerberosed hadoop cluster from a standalone
> spark cluster.
> Right now, I encountered some authentication issues with Kerberos:
>
>
> java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: ""; 
> destination host is: XXX;
>
>
>
> I checked with klist, and principle/realm is correct.
> I also used hdfs command line to poke HDFS from all the nodes, and it
> worked.
> And if I submit job using local(client) mode, the job worked fine.
>
> I tried to put everything from hadoop/conf to spark/conf and hive/conf to
> spark/conf.
> Also tried edit spark/conf/spark-env.sh to add SPARK_SUBMIT_OPTS/SPARK_
> MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR, and tried to
> export them in .bashrc as well.
>
> However, I'm still experiencing the same exception.
>
> Then I read some concerning posts about problems with
> kerberosed hadoop, some post like the following one:
> http://blog.stratio.com/spark-kerberos-safe-story/
> , which indicates that we can not access to kerberosed hdfs using
> standalone spark cluster.
>
> I'm using spark 2.1.1, is it still the case that we can't access
> kerberosed hdfs with 2.1.1?
>
> Thanks!
>
>
> Best regards,
> Mu
>
>


[jira] [Comment Edited] (SPARK-21080) Workaround for HDFS delegation token expiry broken with some Hadoop versions

2017-06-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058843#comment-16058843
 ] 

Saisai Shao edited comment on SPARK-21080 at 6/22/17 6:39 AM:
--

That PR should be worked, we applied that one to our internal branch and 
verified. But it is a little out-dated, need to rebase.

[~srowen] what's your opinion of this PR 
(https://github.com/apache/spark/pull/9168)? That PR tried to workaround 
kerberos issue in HDFS HA scenario, the issue is really a HDFS issue but it 
only fixed after version Hadoop 2.8.2. I think we still supports Hadoop 2.6, 
and for lots of user it is pretty hard to upgrade HDFS to a newer version. So I 
think it should be useful to merge that workaround into Spark, to fix the issue 
from Spark's aspect. What is your suggestion?


was (Author: jerryshao):
That PR should be worked, we applied that one to our internal branch and 
verified.

[~srowen] what's your opinion of this PR 
(https://github.com/apache/spark/pull/9168)? That PR tried to workaround 
kerberos issue in HDFS HA scenario, the issue is really a HDFS issue but it 
only fixed after version Hadoop 2.8.2. I think we still supports Hadoop 2.6, 
and for lots of user it is pretty hard to upgrade HDFS to a newer version. So I 
think it should be useful to merge that workaround into Spark, to fix the issue 
from Spark's aspect. What is your suggestion?

> Workaround for HDFS delegation token expiry broken with some Hadoop versions
> 
>
> Key: SPARK-21080
> URL: https://issues.apache.org/jira/browse/SPARK-21080
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0
> Environment: Spark 2.1.0 on Yarn, Hadoop 2.7.3
>Reporter: Lukasz Raszka
>Priority: Minor
>
> We're getting struck by SPARK-11182, where the core issue in HDFS has been 
> fixed in more recent versions. It seems that [workaround introduced by user 
> SaintBacchus|https://github.com/apache/spark/commit/646366b5d2f12e42f8e7287672ba29a8c918a17d]
>  doesn't work in newer version of Hadoop. This seems to be cause by a move of 
> property name from {{fs.hdfs.impl}} to {{fs.AbstractFileSystem.hdfs.impl}} 
> which happened somewhere around 2.7.0 or earlier. Taking this into account 
> should make workaround work again for less recent Hadoop versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21080) Workaround for HDFS delegation token expiry broken with some Hadoop versions

2017-06-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058843#comment-16058843
 ] 

Saisai Shao commented on SPARK-21080:
-

That PR should be worked, we applied that one to our internal branch and 
verified.

[~srowen] what's your opinion of this PR 
(https://github.com/apache/spark/pull/9168)? That PR tried to workaround 
kerberos issue in HDFS HA scenario, the issue is really a HDFS issue but it 
only fixed after version Hadoop 2.8.2. I think we still supports Hadoop 2.6, 
and for lots of user it is pretty hard to upgrade HDFS to a newer version. So I 
think it should be useful to merge that workaround into Spark, to fix the issue 
from Spark's aspect. What is your suggestion?

> Workaround for HDFS delegation token expiry broken with some Hadoop versions
> 
>
> Key: SPARK-21080
> URL: https://issues.apache.org/jira/browse/SPARK-21080
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.0
> Environment: Spark 2.1.0 on Yarn, Hadoop 2.7.3
>Reporter: Lukasz Raszka
>Priority: Minor
>
> We're getting struck by SPARK-11182, where the core issue in HDFS has been 
> fixed in more recent versions. It seems that [workaround introduced by user 
> SaintBacchus|https://github.com/apache/spark/commit/646366b5d2f12e42f8e7287672ba29a8c918a17d]
>  doesn't work in newer version of Hadoop. This seems to be cause by a move of 
> property name from {{fs.hdfs.impl}} to {{fs.AbstractFileSystem.hdfs.impl}} 
> which happened somewhere around 2.7.0 or earlier. Taking this into account 
> should make workaround work again for less recent Hadoop versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21082) Consider Executor's memory usage when scheduling task

2017-06-14 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050016#comment-16050016
 ] 

Saisai Shao commented on SPARK-21082:
-

That's fine if the storage memory is not enough to cache all the data, Spark 
still could handle this scenario without OOM. Base on the free memory to 
schedule the task is too scenario specific from my understanding.

[~tgraves] [~irashid] [~mridulm80] may have more thoughts on it. 

> Consider Executor's memory usage when scheduling task 
> --
>
> Key: SPARK-21082
> URL: https://issues.apache.org/jira/browse/SPARK-21082
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.0
>Reporter: DjvuLee
>
>  Spark Scheduler do not consider the memory usage during dispatch tasks, this 
> can lead to Executor OOM if the RDD is cached sometimes, because Spark can 
> not estimate the memory usage well enough(especially when the RDD type is not 
> flatten), scheduler may dispatch so many tasks on one Executor.
> We can offer a configuration for user to decide whether scheduler will 
> consider the memory usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    4   5   6   7   8   9   10   11   12   13   >