[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20146 Thanks. I rebased it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-

[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21533 **[Test build #4219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4219/testReport)** for PR 21533 at commit [`eb46ccf`](https://github.com/apache/spark/commit/

[GitHub] spark issue #21807: [SPARK-24536] Validate that limit clause cannot have a n...

2018-07-18 Thread NiharS
Github user NiharS commented on the issue: https://github.com/apache/spark/pull/21807 New to SQL, but it seems like the query `SELECT 1 LIMIT CAST('1' AS INT)` should work, right? I tried both on Spark without to your change and the W3Schools SQL tester and it's accep

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1104/

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21447 @maropu Do you want to take this over and add such a project in `ColumnPruning`? --- - To unsubscribe, e-mail: reviews-unsu

[GitHub] spark issue #21656: [SPARK-24677][Core]Avoid NoSuchElementException from Med...

2018-07-18 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21656 merged thanks @cxzl25 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: revi

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20146 **[Test build #93246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93246/testReport)** for PR 20146 at commit [`c003bd3`](https://github.com/apache/spark/commit/c0

[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-18 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203495430 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark issue #21468: [SPARK-22151] : PYTHONPATH not picked up from the spark....

2018-07-18 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21468 merged thanks @pgandhi999 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r203496489 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21468 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20146 Yeah, looks appveyer tests are triggered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comman

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-07-18 Thread edwinalu
Github user edwinalu commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r203503691 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -160,11 +160,29 @@ case class SparkListenerBlockUpdated(blockUpdatedIn

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21710 **[Test build #93245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93245/testReport)** for PR 21710 at commit [`ec88d38`](https://github.com/apache/spark/commit/e

[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21795 **[Test build #93239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93239/testReport)** for PR 21795 at commit [`c83eeeb`](https://github.com/apache/spark/commit/c

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93245/ Test PASSed. ---

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21795 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21795 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93239/ Test PASSed. ---

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/21748 > Note that we only invoke any of the feature steps and the entry point of KubernetesClientApplication if we run in cluster mode. If we run in client mode, we enter directly into the user's main cl

[GitHub] spark issue #21809: [SPARK-24851] : Map a Stage ID to it's Associated Job ID...

2018-07-18 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21809 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spa

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21589 @MaxGekk The example you cites is literally one of a handful of usages which is not easily overridden - and is prefixed with a 'HACK ALERT' ! A few others are in mllib, typically for reading schema.

[GitHub] spark issue #21809: [SPARK-24851] : Map a Stage ID to it's Associated Job ID...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21809 **[Test build #93247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93247/testReport)** for PR 21809 at commit [`7be0520`](https://github.com/apache/spark/commit/7b

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/21748 PS: Actually, there would even be no issue with the port assignment as Spark knows which ports he will be using, so he can create the headless service with the correct ports for the user. --- --

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread gvr
Github user gvr commented on the issue: https://github.com/apache/spark/pull/21806 Updated description, thanks @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 @echarles I don't think we should be special-casing Kubernetes here as being any different from the other cluster managers. The main point of client mode is that the driver is running locally and we

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 Though I suppose you could have the driver patch its own metadata fields to assign itself a unique label. I could see that being confusing to users when they observe that their driver pod metadata i

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21806 **[Test build #93240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93240/testReport)** for PR 21806 at commit [`68d6f19`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21806 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93240/ Test PASSed. ---

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21806 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/21748 @mccheah If I compare with yarn-client with all nodes on the same LAN, we introduce complexity here as the user has to ensure not only configuration, but also deployment of a particular resource. I

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

2018-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21720 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comm

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-07-18 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r203520320 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -160,11 +160,29 @@ case class SparkListenerBlockUpdated(blockUpdatedInfo

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21589 @mridulm scheduler pools could also make the cluster-wide resource numbers not very meaningful. I don't think the maxShare work has been merged yet (kind of a stalled TODO on an open PR, IIRC),

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 > About selecting the pod with labels, another approach I have taken is simply using the name of the driver pod, a bit like I have done with the following deployment (so no need to ensure labels - t

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21720 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21729: [SPARK-24755][Core] Executor loss can cause task to not ...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21729 **[Test build #93248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93248/testReport)** for PR 21729 at commit [`f9ed226`](https://github.com/apache/spark/commit/f9

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread liyinan926
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21748 > I don't think you can back a service with a selector that's a pod's name, but someone with more knowledge of the Service API might be able to correct me here. I was under the impression one had

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 > Yes, the service gets its endpoints by matching its label selector against labels on the pods so it's critical to have matching labels. Another tenable solution is for the driver backend code to g

[GitHub] spark issue #21729: [SPARK-24755][Core] Executor loss can cause task to not ...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21729 **[Test build #93249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93249/testReport)** for PR 21729 at commit [`6316e5b`](https://github.com/apache/spark/commit/63

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21806 The change looks fine. However I'm wondering that have we have chance to compare hash code between expr ids from different jvms? ---

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread liyinan926
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21748 > The problem is that the driver's labels might not be unique to that driver, which therefore would require the user to assign their own unique labels or for us to patch the driver pod in-place t

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > ... unless explicitly overridden by user. This is the problem this PR addresses, actually. > If you need fine grained information about executors, use spark listener (it is trivia

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/21748 Got you points. About labels, right, we could take the road of the code that creates labels on its own pod. To ensure uniqueness, we could use the `spark-app-id` as key (if it maps the requirement

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread liyinan926
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21748 > Got you points. About labels, right, we could take the road of the code that creates labels on its own pod. To ensure uniqueness, we could use the spark-app-id as key (if it maps the requiremen

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203526021 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/PrefixSpanWrapper.scala --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21720 @gatorsmile @maryannxue Can we move forward with this PR: https://github.com/apache/spark/pull/21699 ? --- - To unsubscribe, e-m

[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-18 Thread bavardage
Github user bavardage commented on the issue: https://github.com/apache/spark/pull/21794 it does seem that spark currently does distinguish -0 and 0, at least as far as groupbys go ``` scala> case class Thing(x : Float) defined class Thing scala> val df = Seq(

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/21748 > Label spark-app-id is only set if spark-submit goes through the steps to create the driver pod so doesn't apply in this case. In that case, the client process could create its own `spark

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread liyinan926
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21748 > In that case, the client process could create its own spark-client-app-id... Yes, and that's what my point above is about. Regardless of how the driver pod is created and managed, user

[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21533 **[Test build #4220 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4220/testReport)** for PR 21533 at commit [`eb46ccf`](https://github.com/apache/spark/commit/e

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/21806 @viirya this current change is only useful when you compare canonicalized plans created on different JVMs. This has come up when we tried to detect changes in plans over spark versions (plan stab

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread mccheah
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 I think taking a step back, it seems unwise more so to be making any assumptions about the location in which a driver is running in client mode. Client mode is simply just saying that the applicatio

[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21794 `spark-sql` suggests that -0 and 0 are considered the same though. `SELECT -0.0 == 0.0;` returns `true`. It's probably essential not to change behavior here, but if performance is the issue, I think

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread liyinan926
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21748 > That is why I suggested also to remove the driver's knowledge of the driver pod name and to remove the owner reference concept entirely. While, not worrying about the driver pod name a

[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-18 Thread echarles
Github user echarles commented on the issue: https://github.com/apache/spark/pull/21748 > I'm personally leaning towards doing that for the user. Especially if the user is a data scientist behind his notebook launching a paragraph which is supposed to instanciate a Spark REPL

[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-07-18 Thread bomeng
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/21638 Either way works for me, but I think since this is not a private method, so people may use it in their own approach. The minimal change will be the best. --- --

[GitHub] spark issue #21807: [SPARK-24536] Validate that limit clause cannot have a n...

2018-07-18 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/21807 @NiharS yeah that makes sense. @mauropalsgraaf we missed this today (sorry about that). Can you add the null check (bonus points if you call `eval()` only once), add a test for this case? ---

[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-07-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21638 Except for `binaryFiles`, everything else that needs to change is private to Spark. I know it's public in the bytecode, but only Java callers could accidentally exploit that. Still I don't personally

[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2

2018-07-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21582 Thank you so much, @gatorsmile . I will proceed. Also, thank you, @viirya and @maropu . --- - To unsubscribe, e-mail: r

[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...

2018-07-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21202 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1105/

[GitHub] spark pull request #21202: [SPARK-24129] [K8S] Add option to pass --build-ar...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21202 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21451 **[Test build #93250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93250/testReport)** for PR 21451 at commit [`335e26d`](https://github.com/apache/spark/commit/33

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21451 @mridulm @jerryshao @felixcheung last one in the 2GB block limit series. just rebased to include the updates to https://github.com/apache/spark/pull/21440. I will also run my tests on a cluster he

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-07-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 Thanks @HyukjinKwon! Any additional comments @holdenk @sethah @viirya @felixcheung ? --- - To unsubscribe, e-mail: reviews-u

[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-07-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18784 Let's remove it in 3.0 then. We can do it after 2.4 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21808: [SPARK-21261][DOCS][SQL] SQL Regex document fix

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21808 **[Test build #93241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93241/testReport)** for PR 21808 at commit [`a444e80`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #21808: [SPARK-21261][DOCS][SQL] SQL Regex document fix

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21808 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93241/ Test PASSed. ---

[GitHub] spark issue #21808: [SPARK-21261][DOCS][SQL] SQL Regex document fix

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21808 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21589 @MaxGekk We are going in circles. I dont think this is a good api to expose currently - the data is available through multiple other means as I detailed and while not a succinct oneliner, it is

[GitHub] spark issue #21806: [SPARK-24846][SQL] Made hashCode ExprId independent of j...

2018-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21806 @hvanhovell Got it. Thanks for your explanation. LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For addi

[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21761 Hi, @gatorsmile . Could you review this PR? I'm wondering if we can have this for Spark 2.4 before branch-cut. --- - T

[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-07-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203538149 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/PrefixSpanWrapper.scala --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #21810: [SPARK-24854][SQL] Gathering all Avro options int...

2018-07-18 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21810 [SPARK-24854][SQL] Gathering all Avro options into the AvroOptions class ## What changes were proposed in this pull request? In the PR, I propose to put all `Avro` options in new class `Avr

[GitHub] spark issue #21810: [SPARK-24854][SQL] Gathering all Avro options into the A...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21810 **[Test build #93251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93251/testReport)** for PR 21810 at commit [`3a76ba2`](https://github.com/apache/spark/commit/3a

[GitHub] spark issue #21798: [SPARK-24836][SQL] New option for Avro datasource - igno...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21798 Please, look at this PR: https://github.com/apache/spark/pull/21810 . It introduces `AvroOptions`. --- - To unsubscribe, e-mail

[GitHub] spark issue #21810: [SPARK-24854][SQL] Gathering all Avro options into the A...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21810 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21810: [SPARK-24854][SQL] Gathering all Avro options into the A...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21810 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > it's not terribly useful to know, e.g., that there are 5 million cores in the cluster if your Job is running in a scheduler pool that is restricted to using far fewer CPUs via the pool's maxShares

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread ssimeonov
Github user ssimeonov commented on the issue: https://github.com/apache/spark/pull/21589 @mridulm your comments make an implicit assumption, which is quite incorrect: that Spark users read the Spark codebase and/or are aware of Spark internals. Please, consider this PR in the context

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21803 **[Test build #93242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93242/testReport)** for PR 21803 at commit [`f302777`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21803 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93242/ Test PASSed. ---

[GitHub] spark issue #21810: [SPARK-24854][SQL] Gathering all Avro options into the A...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21810 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93251/ Test PASSed. ---

[GitHub] spark issue #21810: [SPARK-24854][SQL] Gathering all Avro options into the A...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21810 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21810: [SPARK-24854][SQL] Gathering all Avro options into the A...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21810 **[Test build #93251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93251/testReport)** for PR 21810 at commit [`3a76ba2`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #21796: [SPARK-24833][K8S][WIP] Add host name aliases feature

2018-07-18 Thread liyinan926
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21796 I think we decided not to take any new configuration options with https://issues.apache.org/jira/browse/SPARK-24434 being worked on. @mccheah @foxish. --- -

[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...

2018-07-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21795 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-18 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21584 just wanted to briefly chime in: TL;DR: this build will fail until the PRB is running on our ubuntu build nodes. we are currently blocked from testing this stuff w/the current i

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21589 No, defaultParallelism isn't more useful in that case, but that just starts getting to my overall assessment of this JIRA and PR: It smells of defining the problem to align with a preconception

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread ssimeonov
Github user ssimeonov commented on the issue: https://github.com/apache/spark/pull/21589 @markhamstra the purpose of this PR is not to address the topic of dynamic resource management in arbitrarily complex Spark environments. Most Spark users do not operate in such environments. It i

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/21589 @ssimeonov the purpose of a public API is not to offer hack solutions to a subset of problems. What is needed is a high-level, declarative abstraction that can be used to specify requested Job r

[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...

2018-07-18 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21795 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apac

[GitHub] spark issue #21804: [SPARK-24268][SQL] Use datatype.catalogString in error m...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21804 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #21804: [SPARK-24268][SQL] Use datatype.catalogString in error m...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21804 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1106/

[GitHub] spark issue #21804: [SPARK-24268][SQL] Use datatype.catalogString in error m...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21804 **[Test build #93252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93252/testReport)** for PR 21804 at commit [`5ffc793`](https://github.com/apache/spark/commit/5f

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread ssimeonov
Github user ssimeonov commented on the issue: https://github.com/apache/spark/pull/21589 @markhamstra even the words you are using indicate that you are missing the intended audience. > high-level, declarative abstraction that can be used to specify requested Job resource-usa

<    1   2   3   4   5   6   >