[GitHub] spark pull request #16057: [SPARK-18624][SQL] Implicit cast complex types

2016-11-29 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16057#discussion_r8343 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -714,6 +714,17 @@ object TypeCoercion {

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69336/consoleFull)** for PR 16030 at commit [`43b4eb0`](https://github.com/apache/spark/commit/4

[GitHub] spark pull request #16060: [SPARK-18220][SQL] read Hive orc table with varch...

2016-11-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16060#discussion_r90059903 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -51,9 +51,12 @@ private[spark] object HiveUtils extends Logging { sc

[GitHub] spark pull request #16060: [SPARK-18220][SQL] read Hive orc table with varch...

2016-11-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16060#discussion_r90060096 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -51,9 +51,12 @@ private[spark] object HiveUtils extends Logging { sc

[GitHub] spark issue #15120: [SPARK-4563][core] Allow driver to advertise a different...

2016-11-29 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15120 > Do you know if there are any alternatives while the commit isn't released? Not really. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 Yes sure the zero Vector is very sparse. Bu ti do not get your suggestion ? I see no way to pass a Sparse Vector as zero and get the type of vector to change underway to Dense Vector with onl

[GitHub] spark issue #15924: [SPARK-18498] [SQL] Revise HDFSMetadataLog API for bette...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15924 **[Test build #69333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69333/consoleFull)** for PR 15924 at commit [`8e3d705`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15924: [SPARK-18498] [SQL] Revise HDFSMetadataLog API for bette...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69333/ Test PASSed. ---

[GitHub] spark issue #15924: [SPARK-18498] [SQL] Revise HDFSMetadataLog API for bette...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15924 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r90062639 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -596,6 +599,26 @@ object SparkSubmit extends CommandLineUtils { }

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r90062695 --- Diff: dev/make-distribution.sh --- @@ -154,7 +154,9 @@ export MAVEN_OPTS="${MAVEN_OPTS:--Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCac # Store the

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #69334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69334/consoleFull)** for PR 14638 at commit [`3c06aa6`](https://github.com/apache/spark/commit/

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16038 It might be as simple as writing `Vectors.sparse(n, Seq())` as the zero value. Everything else appears to operate on a Vector that's either sparse or dense then. (Of course, the first call to axpy sh

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69334/ Test PASSed. ---

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r90063380 --- Diff: kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,222 @@ +/*

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r90063456 --- Diff: kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,222 @@ +/*

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread tnachen
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r90063528 --- Diff: kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterScheduler.scala --- @@ -0,0 +1,236 @@ +/* + * Lic

[GitHub] spark pull request #14638: [SPARK-11374][SQL] Support `skip.header.line.coun...

2016-11-29 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14638#discussion_r90063628 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -122,10 +126,20 @@ class HadoopTableReader( val attrsWithIndex =

[GitHub] spark pull request #14638: [SPARK-11374][SQL] Support `skip.header.line.coun...

2016-11-29 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14638#discussion_r90063348 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -113,6 +113,10 @@ class HadoopTableReader( val tablePath =

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15998 Hi Guys, Repeating my comment/query for @ericl. I'm hoping someone can provide affirmation/refutation to my question before I proceed with new unit tests. I've run some tests to com

[GitHub] spark pull request #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operator...

2016-11-29 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16046#discussion_r90064101 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1120,47 +1173,54 @@ class Analyzer( }

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 THe big doubt I have is on *Of course, the first call to axpy should produce a dense vector, but, that's already on the executor*: the other operand is Sparse and has to be sparse, and this a

[GitHub] spark issue #15982: [SPARK-18546][core] Fix merging shuffle spills when usin...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15982 **[Test build #69337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69337/consoleFull)** for PR 15982 at commit [`2e03ee6`](https://github.com/apache/spark/commit/2

[GitHub] spark pull request #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operator...

2016-11-29 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16046#discussion_r90065066 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1120,47 +1173,54 @@ class Analyzer( }

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16038 Right now, everything is dense here, right? That's the worst case. Your goal is to avoid serializing a dense zero vector and I say it can just be sparse, which solves the immediate problem. From ther

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-11-29 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13909 ping @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/16038 Actually aggregating (and thus sending on the network) on quite dense SparseVectors with 10s of million elements is not to taken lightly. This would required serious benchmarking. What I tell

[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-29 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15998 @mallman I'll take a look today On Tue, Nov 29, 2016, 9:45 AM Michael Allman wrote: > Hi Guys, > > Repeating my comment/query for @ericl . I'm

[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16061 This is pretty cool. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16061 One thing - can we submit a separate pr to move all resource managers into resource-managers/yarn resource-managers/mesos ? --- If your project is set up for it, you can reply t

[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread tnachen
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/16061 @rxin Makes sense, @srowen also talked about starting the discussion of having a better support for external cluster managers as well. --- If your project is set up for it, you can reply to this em

[GitHub] spark issue #15972: [SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, Deve...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15972 **[Test build #69338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69338/consoleFull)** for PR 15972 at commit [`1907019`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16055 The above fix does not cover all the cases. Found the root cause. The `constraints` of an operator is the expressions that evaluate to `true` for all the rows produced. That means, the ex

[GitHub] spark issue #16058: [SPARK-18291][SparkR][ML][FOLLOW-UP] Encode probability ...

2016-11-29 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16058 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the f

[GitHub] spark pull request #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter met...

2016-11-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16017#discussion_r90072953 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala --- @@ -107,25 +107,41 @@ private[ml] trait DecisionTreeParams extends PredictorPa

[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16048 **[Test build #69339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69339/consoleFull)** for PR 16048 at commit [`9ff2ed4`](https://github.com/apache/spark/commit/9

[GitHub] spark issue #8318: [SPARK-1267][PYSPARK] Adds pip installer for pyspark

2016-11-29 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/8318 Since https://github.com/apache/spark/pull/15659 got merged, would you be ok with closing this @alope107? Thanks for your work on this :) --- If your project is set up for it, you can reply to this

[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...

2016-11-29 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15817 LGTM given our planned follow up to update the documentation for both Python and Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16038 I see, you are saying that encoding an actually-dense vector as sparse is somewhat less efficient, and if the implementations happened to make that choice given sparse input, could be bad. In that ca

[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16048 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16048 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69339/ Test FAILed. ---

[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16048 **[Test build #69339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69339/consoleFull)** for PR 16048 at commit [`9ff2ed4`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...

2016-11-29 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/15505 @witgo I don't mind moving the serialization out of resourceOffer, but I do think it's helpful to separate that from the too-many-objects-serialized issue. Smaller PRs are easier for folks to

[GitHub] spark issue #15843: [SPARK-18274][ML][PYSPARK] Memory leak in PySpark JavaWr...

2016-11-29 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15843 I agree, for a follow up (so we don't lose track of it) - I've created SPARK-18630 but option 1 for now is a strict improvement over the current situation. Thanks for all of your work on this @techa

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 LGTM, merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16044: [Spark-18614][SQL] Incorrect predicate pushdown f...

2016-11-29 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16044#discussion_r90083465 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -575,6 +575,24 @@ class JoinSuite extends QueryTest with SharedSQLContext {

[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15979 My only concern is that "non-flat" type is neither intuitive nor a well-known term. In fact, this PR only prevents `Option[T <: Product]` to be top-level Dataset types. How about just call them "P

[GitHub] spark pull request #16062: [SPARK-18629][SQL] Fix numPartition of JDBCSuite ...

2016-11-29 Thread weiqingy
GitHub user weiqingy opened a pull request: https://github.com/apache/spark/pull/16062 [SPARK-18629][SQL] Fix numPartition of JDBCSuite Testcase ## What changes were proposed in this pull request? Fix numPartition of JDBCSuite Testcase. ## How was this patch tested?

[GitHub] spark issue #16062: [SPARK-18629][SQL] Fix numPartition of JDBCSuite Testcas...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16062 **[Test build #69340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69340/consoleFull)** for PR 16062 at commit [`30c5d6f`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16044 **[Test build #69341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69341/consoleFull)** for PR 16044 at commit [`d4002c7`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16061 @erikerlandson For the RAT failure, you may either add Apache license header to newly added files or add the file to `dev/.rat-excludes`. --- If your project is set up for it, you can reply to th

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69336/consoleFull)** for PR 16030 at commit [`43b4eb0`](https://github.com/apache/spark/commit/

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-29 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r90085604 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -305,40 +312,82 @@ final class ShuffleBlockFetcherIterator(

[GitHub] spark pull request #15923: [SPARK-4105] retry the fetch or stage if shuffle ...

2016-11-29 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/15923#discussion_r90085570 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -56,8 +59,10 @@ final class ShuffleBlockFetcherIterator(

[GitHub] spark issue #15923: [SPARK-4105] retry the fetch or stage if shuffle block i...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15923 **[Test build #3443 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3443/consoleFull)** for PR 15923 at commit [`b3e1786`](https://github.com/apache/spark/commit/

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69336/ Test PASSed. ---

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15972: [SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, Deve...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15972 **[Test build #69338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69338/consoleFull)** for PR 15972 at commit [`1907019`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15972: [SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, Deve...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15972 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69338/ Test PASSed. ---

[GitHub] spark issue #15972: [SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, Deve...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15972 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -279,3 +287,8 @@ class StreamingQueryManager private[sq

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90085377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90083627 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -51,7 +53,7 @@ trait StreamingQuery { def sparkSession:

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90086100 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -669,55 +658,48 @@ trait StreamTest extends QueryTest with SharedS

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084842 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -279,3 +287,8 @@ class StreamingQueryManager private[sq

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90085518 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryProgress.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90082045 --- Diff: python/pyspark/sql/streaming.py --- @@ -87,6 +88,24 @@ def awaitTermination(self, timeout=None): else: return self._j

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] head() and show() for Columns

2016-11-29 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/11336 I did another pass. It looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90084420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala --- @@ -81,30 +83,30 @@ object StreamingQueryListener {

[GitHub] spark pull request #15954: [WIP][SPARK-18516][SQL] Split state and progress ...

2016-11-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15954#discussion_r90083413 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala --- @@ -38,11 +40,11 @@ trait StreamingQuery { def name: String

[GitHub] spark issue #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter methods to...

2016-11-29 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16017 LGTM Merging with master and branch-2.1 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...

2016-11-29 Thread liancheng
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15976 @cloud-fan @dongjoon-hyun Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-29 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16044 LGTM - pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark pull request #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scal...

2016-11-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16009#discussion_r90088312 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -49,15 +49,13 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-11-29 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/15995#discussion_r90088361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -129,65 +129,67 @@ case class DataSourceAnalysis(

[GitHub] spark pull request #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter met...

2016-11-29 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16017 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90085605 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -337,42 +340,38 @@ void forceSorterToSpill() throws IOException {

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90089946 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -338,42 +354,60 @@ private void testMergingSpills(

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90083229 --- Diff: core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala --- @@ -144,14 +144,14 @@ private[spark] class SerializerManager( /*

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90083215 --- Diff: core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala --- @@ -144,14 +144,14 @@ private[spark] class SerializerManager( /*

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90086862 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -17,9 +17,13 @@ package org.apache.spark.u

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90082215 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -337,42 +340,38 @@ void forceSorterToSpill() throws IOException {

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90076997 --- Diff: core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala --- @@ -36,7 +36,7 @@ import org.apache.spark.util.io.{ChunkedByteBuffer,

[GitHub] spark pull request #15982: [SPARK-18546][core] Fix merging shuffle spills wh...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15982#discussion_r90086837 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -17,9 +17,13 @@ package org.apache.spark.u

[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14582 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark issue #15344: Added the PrefixSpan MLlib example based on the current ...

2016-11-29 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15344 Ah yes, testing the documentation can be a bit difficult. You can take a look at the guide under docs/README.md to see how to build the documentation locally and verify it looks like you expect it t

[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2016-11-29 Thread erikerlandson
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/16061#discussion_r90092345 --- Diff: kubernetes/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala --- @@ -0,0 +1,222 @@ +/

[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15780 **[Test build #69342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69342/consoleFull)** for PR 15780 at commit [`39e4930`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #16045: [SPARK-18553][CORE] Fix leak of TaskSetManager following...

2016-11-29 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16045 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16061: [SPARK-18278] [Scheduler] Support native submission of s...

2016-11-29 Thread erikerlandson
Github user erikerlandson commented on the issue: https://github.com/apache/spark/pull/16061 @rxin, when you say "move all resource managers" does that mean "move scheduler back-ends for mesos, yarn, etc, into some `resource-managers` sub-project" ? --- If your project is set up fo

[GitHub] spark pull request #14638: [SPARK-11374][SQL] Support `skip.header.line.coun...

2016-11-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14638#discussion_r90095923 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -113,6 +113,10 @@ class HadoopTableReader( val tabl

[GitHub] spark pull request #16063: [SPARK-18622][SQL] Remove TypeCoercion rules for ...

2016-11-29 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/16063 [SPARK-18622][SQL] Remove TypeCoercion rules for Average and Sum aggregate functions ## What changes were proposed in this pull request? Spark currently has special analyzer rules for the 'S

[GitHub] spark issue #16063: [SPARK-18622][SQL] Remove TypeCoercion rules for Average...

2016-11-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16063 **[Test build #69343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69343/consoleFull)** for PR 16063 at commit [`7596b5a`](https://github.com/apache/spark/commit/7

[GitHub] spark pull request #16047: [SPARK-17783] [SQL] [BACKPORT-2.0] Hide Credentia...

2016-11-29 Thread gatorsmile
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/16047 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16000: [SPARK-18537][Web UI]Add a REST api to spark streaming

2016-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16000 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark issue #15972: [SPARK-18319][ML][QA2.1] 2.1 QA: API: Experimental, Deve...

2016-11-29 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15972 Thanks for the review @jkbradley . Updated Python part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark issue #16063: [SPARK-18622][SQL] Remove TypeCoercion rules for Average...

2016-11-29 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16063 cc @gatorsmile @cloud-fan could you take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-29 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16044 LGTM - pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark pull request #16064: [SPARK-18633][ML][Example]: Add multiclass logist...

2016-11-29 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/16064 [SPARK-18633][ML][Example]: Add multiclass logistic regression summary python example and document ## What changes were proposed in this pull request? Logistic Regression summary is added

<    1   2   3   4   5   6   7   >