Hadoop-Token-Across-Kerberized -Cluster

2018-10-16 Thread Davinder Kumar
Hello All, Need one help for Kerberized cluster. Having two Ambari-clusters. Cluster A and Cluster B both are Kerberized with same KDC Use case is : Need to access the Hive data from Cluster B to Cluster A. Action done - Remote Cluster B Principal and keytab are provided to Cluster A.

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Marcelo Vanzin
Might be good to take a look at things marked "@DeveloperApi" and whether they should stay that way. e.g. I was looking at SparkHadoopUtil and I've always wanted to just make it private to Spark. I don't see why apps would need any of those methods. On Tue, Oct 16, 2018 at 10:18 AM Sean Owen

?????? SparkSQL read Hive transactional table

2018-10-16 Thread daily
Hi, Spark version: 2.3.0 Hive version: 2.1.0 Best regards. -- -- ??: "Gourav Sengupta"; : 2018??10??16??(??) 6:35 ??: "daily"; : "user"; "dev"; : Re: SparkSQL read Hive transactional table Hi, can I please

Re: Hadoop 3 support

2018-10-16 Thread t4
has anyone got spark jars working with hadoop3.1 that they can share? i am looking to be able to use the latest hadoop-aws fixes from v3.1 -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Dongjoon Hyun
I also agree with Reynold and Xiao. Although I love that new feature, Spark 2.4 branch-cut was made a long time ago. We cannot backport new features at this stage at RC4. In addition, could you split Apache SPARK issue IDs, Ilan? It's confusing during discussion. (1) [SPARK-23257][K8S]

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Xiao Li
We need to strictly follow the backport and release policy. We can't merge such a new feature into a RC branch or a minor release (e.g., 2.4.1). Cheers, Xiao Bolke de Bruin 于2018年10月16日周二 下午12:48写道: > Chiming in here. We are in the same boat as Bloomberg. > > (But being a release manager

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Bolke de Bruin
Chiming in here. We are in the same boat as Bloomberg. (But being a release manager often myself I understand the trade-off) B. Op di 16 okt. 2018 21:24 schreef Ilan Filonenko : > On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would > the next RC be? I would like to propose

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Ilan Filonenko
On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would the next RC be? I would like to propose the inclusion of the Kerberos feature sooner rather than later as it would increase Spark-on-K8S adoption in production workloads while bringing greater feature parity with Yarn and

Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Sean Owen
There was already agreement to delete deprecated things like Flume and Kafka 0.8 support in master. I've got several more on my radar, and wanted to highlight them and solicit general opinions on where we should accept breaking changes. For example how about removing accumulator v1?

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Erik Erlandson
SPARK-23257 merged more recently than I realized. If that isn't on branch-2.4 then the first question is how soon on the release sequence that can be adopted On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin wrote: > We shouldn’t merge new features into release branches anymore. > > On Tue, Oct 16,

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Yinan Li
Yep, the Kerberos support for k8s is in the master but not in branch-2.4. I see no reason to get the integration tests into 2.4, which depend on the feature in the master. On Tue, Oct 16, 2018 at 9:32 AM Rob Vesse wrote: > Right now the Kerberos support for Spark on K8S is only on master AFAICT

Re: configure yarn to use more vcores as the node provides?

2018-10-16 Thread Peter Liu
Hi Khaled, the 50-2-3g I mentioned below is meant for the --conf spark.executor.* config, in particular, spark.executor.instances=50, spark.executor.cores=2 and spark.executor.memory=3g. for each run, I configured the streaming producer and kafka broker to have the partitions aligned with the

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Reynold Xin
We shouldn’t merge new features into release branches anymore. On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse wrote: > Right now the Kerberos support for Spark on K8S is only on master AFAICT > i.e. the feature is not present on branch-2.4 > > > > Therefore I don’t see any point in adding the tests

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Rob Vesse
Right now the Kerberos support for Spark on K8S is only on master AFAICT i.e. the feature is not present on branch-2.4 Therefore I don’t see any point in adding the tests into branch-2.4 unless the plan is to also merge the Kerberos support to branch-2.4 Rob From: Erik Erlandson

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Felix Cheung
I’m in favor of it. If you check the PR it’s a few isolated script changes and all test-only changes. Should have low impact on release but much better integration test coverage. From: Erik Erlandson Sent: Tuesday, October 16, 2018 8:20 AM To: dev Subject:

[DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Erik Erlandson
I'd like to propose including integration testing for Kerberos on the Spark 2.4 release: https://github.com/apache/spark/pull/22608 Arguments in favor: 1) it improves testing coverage on a feature important for integrating with HDFS deployments 2) its intersection with existing code is small - it

HADOOP-13421 anyone using this with Spark

2018-10-16 Thread t4
https://issues.apache.org/jira/browse/HADOOP-13421 mentions s3a can use s3 v2 api for performance. has anyone been able to use this new hadoop-aws jar with Spark? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: Timestamp Difference/operations

2018-10-16 Thread Paras Agarwal
Thanks Srabasti, I am trying to convert teradata to spark sql. TERADATA: select * from Table1 where Date '1974-01-02' > CAST(birth_date AS TIMESTAMP(0)) + (TIME '12:34:34' - TIME '00:00:00' HOUR TO SECOND); HIVE ( With some tweaks i can write): SELECT * FROM foodmart.trimmed_employee WHERE

Re: overcommit: cpus / vcores

2018-10-16 Thread Khaled Zaouk
Hi Peter, I actually meant the spark configuration that you put in your spark-submit program (such as --conf spark.executor.instances= ..., --conf spark.executor.memory= ..., etc...). I advice you to check the number of partitions that you get in each stage of your workload the Spark GUI while